Possible Combination of Knapsack problem and? - c#

Alright quick overview
I have looked into the knapsack problem
http://en.wikipedia.org/wiki/Knapsack_problem
and i know it is what i need for my project, but the complicated part of my project would be that i need multiple sacks inside a main sack.
The large knapsack that holds all the "bags" can only carry x amount of "bags" (lets say 9 for sake of example). Each bag has different values;
Weight
Cost
Size
Capacity
and so on, all of those values are integer numbers. Lets assume from 0-100.
The inner bag will also be assigned a type, and there can only be one of that type within the outer bag, although the program input will be given multiple of the same type.
I need to assign a maximum weight that the main bag can hold, and all other properties of the smaller bags need to be grouped by weighted values.
Example
Outer Bag:
Can hold 9 smaller bags
Weight no more than 98 [Give or take 5 either side]
Must hold one of each type, Can only hold one of each type at a time.
Inner Bags:
Cost, Weighted at 100%
Size, Weighted at 67%
Capacity, Weighted at 44%
The program will be given an input of multiple bags, and then must work out combinations of Smaller Bags to go into the larger bag, there will be multiple solutions depending on the input, and the program would output the best solutions for me.
I am wondering what you guys think the best way for me to approach this would be.
I will be programming it in either Java, or C#. I would love to program it in PHP but i'm afraid the algorithm would be very inefficient for web servers.
Thanks for any help you can give
-Zack

Okay, well, knapsack is NP-hard so I'm pretty certain this will be NP-hard as well (if it weren't you could solve knapsack by doing this with only one outer bag.) So for an exactly optimal solution, you're probably going to be able to do no beter than searching all combinations. So the outline of the program you want will be like
for each possible combination
do
if current combination is better than best previous
save current combination as best so far
fi
od
and the run time will be exponential. It sounds, though, like you might be able to get a near solution with dynamic programming.

Consider using Prolog for your logical programming. There's multiple implementations of it including P# on mono (.NET). Theres a bit of a learning curve, but once you get used to it, it's pretty much in a league of its own for this kind of problem solving.
Hope this helps. Cheers!
link to P#

Related

3D Data Interpolation in C#

I'm looking for a simple function in C# to interpolate my 3D data.
Given is already a list with around 100-150 data sets and 3 double values.
-25.000000 -0.770568 2.444945
-20.000000 -0.726583 2.467809
-15.000000 -0.723274 2.484167
-10.000000 -0.723114 2.506445
and so on...
The chart created by these values looks usually like this, I'm not sure if this counts as scattered or rather still gridded data ...
In the end I want to hand over two double values and get the third then from the interpolation function. It shouldn't flatten the surface, it should still go through all the given data points.
Since I'm not given the time to look into all possible algorithms and lack the mathematical background I'm a bit overwhelmed by all the possibilities that I get thrown at: Kriging, Delauney triangulation, NURBs and many more ...
In addition to that most solutions I found in the net were either for a different language, outdated or are charged by the time (e.g ilnumerics, still not sure if they have the solution)
In matlab there exists a griddata function that does exactly this (and is based on a kriging algorithm as far as I know) but in this case C# is mandatory for me.
Thank you for your help and criticism and suggestions are welcome.

Parse 2D array to rectangles

I'm looking for a way to convert a 2D array to the fewest possible rectangles like in this example:
X
12345678
--------
1|00000000
2|00011100
3|00111000
Y 4|00111000
5|00111000
6|00000000
to the corner coordinates of the rectangles:
following the (x1,y1);(x2;y2) template
rectangle #1 (4,2);(6,2)
rectangle #2 (3,3);(5,5)
There has been a similar question here before but unfortunately, the link provided in its answer is broken, and I cannot check it anymore.
I'd like to do this in C# but any kind of help is appreciated.
(It doesn't even have to be the fewest possible rectangles, but the fewer the better :) )
Thanks in advance!
I think that you are trying to cover a set of points in the 2D plane with the minimum required number of rectangles. An answer to Find k rectangles so that they cover the maximum number of points said that this was an NP-complete problem and linked to here (which works for me). A google search finds http://2011.cccg.ca/PDFschedule/papers/paper102.pdf.
There papers agree that rectangle covering is NP-complete but do not actually prove it, and the references for this seem to be unusually elusive - https://cstheory.stackexchange.com/questions/3957/prove-that-the-problem-of-rectilinear-picture-compression-is-np-complete
What I take from these documents is this:
It is unlikely that there is an affordable way of getting the absolutely best answer for large problems, so you might have to either spend a lot of time to get exact answers for problems that are in some sense small, by exhausting over all possible alternatives or perhaps using something like branch and bound, or settle for affordable methods - like greedy search, or beam search, or limited discrepancy search - which are not guaranteed to give you the absolutely best answer.
In this case there seem to be more restricted versions of this problem which are not NP-complete. You might possibly read a paper and find that there is some detail of your problem that means that this method applies to you. One example is "AN ALGORITHM FOR CONSTRUCTING REGIONS WITH RECTANGLES:
INDEPENDENCE AND MINIMUM GENERATING SETS
FOR COLLECTIONS OF INTERVALS*" by Franzblau and Kleitman - I found this in the ACM Digital Library, though - I don't know if it is generally accessible. It works for a restricted set of polygons.
This may help you get started. If you convert the binary data to numbers, you get this:
0
28
56
56
56
0
So where ever there are consecutive equal numbers, there is a rectangle.

Predicting new (unknown) sequence values using aforge GA

I've been messing around with the aforge time series genetic algorithm sample and I've got my own version working, atm it's just 'predicting' Fibonacci numbers.
The problem is when I ask it to predict new values beyond the array I've given it (which contains the first 21 numbers of the sequence, using a window size of 5) it won't do it, it throws an exception that says "Data size should be enough for window and prediction".
As far as I can tell I'm supposed to decipher the bizarre formula contained in "population.BestChromosome" and use that to extrapolate future values, is that right? Is there an easier way? Am I overlooking something massively obvious?
I'd ask on the aforge forum but the developer is not supporting it anymore.
As far as I can tell I'm supposed to decipher the bizarre formula
contained in "population.BestChromosome" and use that to extrapolate
future values, is that right?
What you call a "bizarre formula" is called a model in data analysis. You learn such a model from past data and you can feed it new data to get a predicted outcome. Whether that new outcome makes sense or is just garbage depends on how general your model is. Many techniques can learn very good models that explain the observed data very well, but which are not generalizable and will return unuseful results when you feed new data into the model. You need to find a model that both explains the given data as well as potentially unobserved data which is a non-trivial process. Usually people estimate the generalization error of that model by splitting the known data into two partitions: one with which the model is learned and another one on which the learned models are tested. You then want to select that model which is accurate on both data. You can also check out the answer I gave on another question here which also treats the topic of machine learning: https://stackoverflow.com/a/3764893/189767
I don't think you're "overlooking something massively obvious", but rather you're faced with a problem that is not trivial to solve.
Btw, you can also use genetic programming (GP) in HeuristicLab. The model of GP is a mathematical formula and in HeuristicLab you can export that model to e.g. MatLab.
Ad Fibonacci, the closed formula for Fibonacci numbers is F(n) = (phi^n - psi^n) / sqrt(5) where phi and psi are special magic numbers according to wikipedia. If you want to find that with GP you need one variable (n), three constants, and the power function. However, it's very likely that you find a vastly different formula that is similar in output. The problem in machine learning is that very different models can produce the same output. The recursive form requires that you include the values of the past two n into the data set. This is similar to learning a model for a time series regression problem.

LibSVM turns all my training vectors into support vectors, why?

I am trying to use SVM for News article classification.
I created a table that contains the features (unique words found in the documents) as rows.
I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1 or else 0.
Ex:- Training sample generated...
1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1
10:1 11:1 12:1 13:1 14:1 15:1 16:1
17:1 18:1 19:1 20:1 21:1 22:1 23:1
24:1 25:1 26:1 27:1 28:1 29:1 30:1
As this is the first document all the features are present.
I am using 1, 0 as class labels.
I am using svm.Net for classification.
I gave 300 weight vectors manually classified as training data and the model generated is taking all the vectors as support vectors, which is surely overfitting.
My total features (unique words/row count in feature vector DB table) is 7610.
What could be the reason?
Because of this over fitting my project is now in pretty bad shape. It is classifying every article available as a positive article.
In LibSVM binary classification is there any restriction on the class label?
I am using 0, 1 instead of -1 and +1. Is that a problem?
You need to do some type of parameter search, also if the classes are unbalanced the classifier might get artificially high accuracies without doing much. This guide is good at teaching basic, practical things, you should probably read it
As pointed out, a parameter search is probably a good idea before doing anything else.
I would also investigate the different kernels available to you. The fact that you input data is binary might be problematic for the RBF kernel (or might render it's usage sub-optimal, compared to another kernel). I have no idea which kernel could be better suited, though. Try a linear kernel, and look around for more suggestions/idea :)
For more information and perhaps better answers, look on stats.stackexchange.com.
I would definitely try using -1 and +1 for your labels, that's the standard way to do it.
Also, how much data do you have? Since you're working in 7610-dimensional space, you could potentially have that many support vectors, where a different vector is "supporting" the hyperplane in each dimension.
With that many features, you might want to try some type of feature selection method like principle component analysis.

All valid combinations of points, in the most (speed) effective way

I know there are quite some questions out there on generating combinations of elements, but I think this one has a certain twist to be worth a new question:
For a pet proejct of mine I've to pre-compute a lot of state to improve the runtime behavior of the application later. One of the steps I struggle with is this:
Given N tuples of two integers (lets call them points from here on, although they aren't in my use case. They roughly are X/Y related, though) I need to compute all valid combinations for a given rule.
The rule might be something like
"Every point included excludes every other point with the same X coordinate"
"Every point included excludes every other point with an odd X coordinate"
I hope and expect that this fact leads to an improvement in the selection process, but my math skills are just being resurrected as I type and I'm unable to come up with an elegant algorithm.
The set of points (N) starts small, but outgrows 64 soon (for the "use long as bitmask" solutions)
I'm doing this in C#, but solutions in any language should be fine if it explains the underlying idea
Thanks.
Update in response to Vlad's answer:
Maybe my idea to generalize the question was a bad one. My rules above were invented on the fly and just placeholders. One realistic rule would look like this:
"Every point included excludes every other point in the triagle above the chosen point"
By that rule and by choosing (2,1) I'd exclude
(2,2) - directly above
(1,3) (2,3) (3,3) - next line
and so on
So the rules are fixed, not general. They are unfortunately more complex than the X/Y samples I initially gave.
How about "the x coordinate of every point included is the exact sum of some subset of the y coordinates of the other included points". If you can come up with a fast algorithm for that simply-stated constraint problem then you will become very famous indeed.
My point being that the problem as stated is so vague as to admit NP-complete or NP-hard problems. Constraint optimization problems are incredibly hard; if you cannot put extremely tight bounds on the problem then it very rapidly becomes not analyzable by machines in polynomial time.
For some special rule types your task seems to be simple. For example, for your example rule #1 you need to choose a subset of all possible values of X, and than for each value from the subset assign an arbitrary Y.
For generic rules I doubt that it's possible to build an efficient algorithm without any AI.
My understanding of the problem is: Given a method bool property( Point x ) const, find all points the set for which property() is true. Is that reasonable?
The brute-force approach is to run all the points through property(), and store the ones which return true. The time complexity of this would be O( N ) where (a) N is the total number of points, and (b) the property() method is O( 1 ). I guess you are looking for improvements from O( N ). Is that right?
For certain kind of properties, it is possible to improve from O( N ) provided suitable data structure is used to store the points and suitable pre-computation (e.g. sorting) is done. However, this may not be true for any arbitrary property.

Categories

Resources