I've got two lists of points, let's call them L1( P1(x1, y1), ... Pn(xn, yn)) and L2(P'1(x'1, y'1), ... P'n(x'n, y'n)).
My task is to find the best match between their points for minimizing the sum of their distances.
Any clue on some algorithm? The two lists contain approx. 200-300 points.
Thanks and bests.
If the use case of your problem involves matching ever point present in list L1 with a point in list L2, then the Hungarian Algorithm would serve as a perfect fit.
The weights corresponding to your Hungarian matrix would be the distance between the point annotated for the row vs the column. The overall runtime for the optimized Hungarian algorithm is O(n3) which will comfortably fit for your given constraint of n = 300
A pretty nice tutorial covering the ideology and implementation of the Hungarian algorithm is https://www.topcoder.com/community/competitive-programming/tutorials/assignment-problem-and-hungarian-algorithm/
If not for the Hungarian algorithm, you can also morph the given problem into a max-flow-min-cost problem - the details of which I'll omit for now but can discuss if required.
Related
I have two sets of points, A and B, and I'm trying to find the closest pair of points where one point is taken from each set. That is, if you were to use the points two draw to lines, I want the two points that allow me to draw the shortest line segment between the two lines.
Looking around, almost everything seems to deal with finding the closest points in 1 set. Although I did find one solution recommending voronoi tesselation to begin with, which seems a bit like overkill, I'm just looking for something a bit nicer than O(n^2).
If it helps, the two sets I'm comparing form lines, although they are not necessarily straight and I'm writing this in C#.
Thanks.
It should be possible to adapt the classical D&C algorithm (as described in the Wikipedia link), by processing all points together and tagging them with an extra bit.
The merging step needs to be modified to accept candidate left-right pairs with a member from every set only. This way, the recursive function will return the closest A-B pair. The O(N.Log(N)) behavior should be preserved.
If the "lines" you mention have a known equation so that point/line distances (or even line/line intersections) can be evaluated quickly, there could be faster solutions.
To clarify my post, I have edited it based on comments.
I was thinking how to implement a nearest neighbour search efficiently when edge costs are asymmetric. I'm thinking a range of cities something like from 100 to 12000.
In more detail, as an example, there's a cost COST1 on travelling from city A to city B, e.g. by foot, and a cost COST1/10 to travel from B to A, e.g. by train. In other words, the problem I see here is that if I have an asymmetric matrix C representing costs between travelling cities and I select one point A, how could discover efficiently, say, three nearest neighbouring cities B1, B2 and B3 in terms of travelling cost? I would like to run the queries repeatedly. Preprocessing time, if not huge, is all right.
The efficiency pondering let me to thinking something like a k-d tree, which faciliates for finding k nearest neighbours in O(lg(n)) time when costs between cities are symmetric. This is the snag with just basic k-d tree in my case as the travelling costs aren't in general the same in both directions between any two cities. The gist of the matter seems to be then, how could I do something like k-nearest neighbours in asymmetric case?
To remedy the aforementioned symmetry assumption, I thought that instead of just one tree, I have two trees constructed so that the costs are calculated in both directions, and then I run a search through both trees. Then I became to wonder, does anyone know if there's already something specifically for the purpose of asymmetric costs and/or would using two trees as an idea be totally astray?
It also may be k-d trees in two dimensions isn't necessarily the most fit solution. So pointers to other data structures and algorithms are welcome too. Especially if someone has practical experience regarding my problem size. Wikipedia lists quite a bunch of approaches, and maybe even approximate solution is good for what I'm trying to do (this is for a smallish game for learning purposes).
For each point you need to calculate costs for all available travel types(foot,travel,..), lead to one unit,compare and get min. And this cost you can use in search algorithms.
Summary: How would I go about solving this problem?
Hi there,
I'm working on a mixture-style maximization problem where my variables are going to be bounded by minima and maxima. A representative example of my problem might be:
maximize: (2x-3y+4z)/(x^2+y^2+z^2+3x+4y+5z+10)
subj. to: x+y+z=1
1 < x < 2
-2 < y < 3
5 < z < 8
where numerical coefficients and the minima/maxima are given.
My final project is involving a more complicated problem similar to the one above. The structure of the problems won't change- only the coefficients and inputs will change. So with the example above, I would be looking for a set of functions that might allow a C# program to quickly determine x, then y, then z like:
x = f(given inputs)
y = f(given inputs,x)
z = f(given inputs,x,y)
Would love to hear your thoughts on this one!
Thanks!
The standard optimization approach for your type of problem, non-linear minimization, is the Levenberg-Marquardt algorithm:
Levenberg–Marquardt algorithm
but unfortunately it does not directly support the linear constraints you have added. Many different approaches have been tried to add linear constraints to Levenberg-Marquardt with varying success.
Another algorithm I can recommend in this situation is the Simplex algorithm:
Nelder–Mead method
Like the Levenberg-Marquardt, it also works with non-linear equations but handles linear constraints which act like discontinuities. This could work well for your case above.
In either case, this is not so much a programming problem as an algorithm selection problem. The literature is rife with algorithms and you can find C# implementations of either of the above with a little searching.
You can also combine algorithms. For example, you can do a preliminary search with Simplex with the constraints and the refine it with Levenberg-Marquardt without the constraints.
If your problem is that you want to solve linear programming problems efficiently, you can use Cassowary.net or NSolver.
If your problem is implementing a linear programming algorithm efficiently, you may want to read Combinatorial Optimization: Algorithms and Complexity which covers the Simplex algorithm in most of the detail provided in the short text An Illustrated Guide to Linear Programming but also includes information on the Ellipsoid algorithm, which can be more efficient for more complex constraint systems.
There's nothing inherently C#-specific about your question, but tagging it with that implies you're looking for a solution in C#; accordingly, reviewing the source code to the two toolkits above may serve you well.
I have very little data for my analysis, and so I want to produce more data for analysis through interpolation.
My dataset contain 23 independent attributes and 1 dependent attribute.....how can this done interpolation?
EDIT:
my main problem is of shortage of data, i hv to increase the size of my dataset, n attributes are categorical for example attribute A may be low, high, meduim, so interpolation is the right approach for it or not????
This is a mathematical problem but there is too little information in the question to properly answer. Depending on distribution of your real data you may try to find a function that it follows. You can also try to interpolate data using artificial neural network but that would be complex. The thing is that to find interpolations you need to analyze data you already have and that defeats the purpose. There is probably more to this problem but not explained. What is the nature of the data? Can you place it in n-dimensional space? What do you expect to get from analysis?
Roughly speaking, to interpolate an array:
double[] data = LoadData();
double requestedIndex = /* set to the index you want - e.g. 1.25 to interpolate between values at data[1] and data[2] */;
int previousIndex = (int)requestedIndex; // in example, would be 1
int nextIndex = previousIndex + 1; // in example, would be 2
double factor = requestedIndex - (double)previousIndex; // in example, would be 0.25
// in example, this would give 75% of data[1] plus 25% of data[2]
double result = (data[previousIndex] * (1.0 - factor)) + (data[nextIndex] * factor);
This is really pseudo-code; it doesn't perform range-checking, assumes your data is in an object or array with an indexer, and so on.
Hope that helps to get you started - any questions please post a comment.
If the 23 independent variables are sampled in a hyper-grid (regularly spaced), then you can choose to partition into hyper-cubes and do linear interpolation of the dependent value from the vertex closest to the origin along the vectors defined from that vertex along the hyper-cube edges away from the origin. In general, for a given partitioning, you project the interpolation point onto each vector, which gives you a new 'coordinate' in that particular space, which can then be used to compute the new value by multiplying each coordinate by the difference of the dependent variable, summing the results, and adding to the dependent value at the local origin. For hyper-cubes, this projection is straightforward (you simply subtract the nearest vertex position closest to the origin.)
If your samples are not uniformly spaced, then the problem is much more challenging, as you would need to choose an appropriate partitioning if you wanted to perform linear interpolation. In principle, Delaunay triangulation generalizes to N dimensions, but it's not easy to do and the resulting geometric objects are a lot harder to understand and interpolate than a simple hyper-cube.
One thing you might consider is if your data set is naturally amenable to projection so that you can reduce the number of dimensions. For instance, if two of your independent variables dominate, you can collapse the problem to 2-dimensions, which is much easier to solve. Another thing you might consider is taking the sampling points and arranging them in a matrix. You can perform an SVD decomposition and look at the singular values. If there are a few dominant singular values, you can use this to perform a projection to the hyper-plane defined by those basis vectors and reduce the dimensions for your interpolation. Basically, if your data is spread in a particular set of dimensions, you can use those dominating dimensions to perform your interpolation, since you don't really have much information in the other dimensions anyway.
I agree with the other commentators, however, that your premise may be off. You generally don't want to interpolate to perform analysis, as you're just choosing to interpolate your data in different ways and the choice of interpolation biases the analysis. It only makes sense if you have a compelling reason to believe that a particular interpolation is physically consistent and you simply need additional points for a particular algorithm.
May I suggest Cubic Spline Interpolation
http://www.coastrd.com/basic-cubic-spline-interpolation
unless you have a very specific need, this is easy to implement and calculates splines well.
Have a look at the regression methods presented in Elements of statistical learning; most of them may be tested in R. There are plenty of models that can be used: linear regression, local models and so on.
I know there are quite some questions out there on generating combinations of elements, but I think this one has a certain twist to be worth a new question:
For a pet proejct of mine I've to pre-compute a lot of state to improve the runtime behavior of the application later. One of the steps I struggle with is this:
Given N tuples of two integers (lets call them points from here on, although they aren't in my use case. They roughly are X/Y related, though) I need to compute all valid combinations for a given rule.
The rule might be something like
"Every point included excludes every other point with the same X coordinate"
"Every point included excludes every other point with an odd X coordinate"
I hope and expect that this fact leads to an improvement in the selection process, but my math skills are just being resurrected as I type and I'm unable to come up with an elegant algorithm.
The set of points (N) starts small, but outgrows 64 soon (for the "use long as bitmask" solutions)
I'm doing this in C#, but solutions in any language should be fine if it explains the underlying idea
Thanks.
Update in response to Vlad's answer:
Maybe my idea to generalize the question was a bad one. My rules above were invented on the fly and just placeholders. One realistic rule would look like this:
"Every point included excludes every other point in the triagle above the chosen point"
By that rule and by choosing (2,1) I'd exclude
(2,2) - directly above
(1,3) (2,3) (3,3) - next line
and so on
So the rules are fixed, not general. They are unfortunately more complex than the X/Y samples I initially gave.
How about "the x coordinate of every point included is the exact sum of some subset of the y coordinates of the other included points". If you can come up with a fast algorithm for that simply-stated constraint problem then you will become very famous indeed.
My point being that the problem as stated is so vague as to admit NP-complete or NP-hard problems. Constraint optimization problems are incredibly hard; if you cannot put extremely tight bounds on the problem then it very rapidly becomes not analyzable by machines in polynomial time.
For some special rule types your task seems to be simple. For example, for your example rule #1 you need to choose a subset of all possible values of X, and than for each value from the subset assign an arbitrary Y.
For generic rules I doubt that it's possible to build an efficient algorithm without any AI.
My understanding of the problem is: Given a method bool property( Point x ) const, find all points the set for which property() is true. Is that reasonable?
The brute-force approach is to run all the points through property(), and store the ones which return true. The time complexity of this would be O( N ) where (a) N is the total number of points, and (b) the property() method is O( 1 ). I guess you are looking for improvements from O( N ). Is that right?
For certain kind of properties, it is possible to improve from O( N ) provided suitable data structure is used to store the points and suitable pre-computation (e.g. sorting) is done. However, this may not be true for any arbitrary property.