In Microsoft Solver Foundation, I'd like to know if it's possible to add a parameter who's value depends on a decision value.
I.e. I want something to the TSP model, but it should also take in account traffic from one point to the other. Please note: traffic depends on the time when the sales man travels on that route.
Here is the model:
I have a matrix of all possible combinations between the cities.
The Decision variable is the Order of the sales man's route. 0 is the first, 1 second,...
I have a property timeToTravel which is bound to a property that calculates the time when the route will take place from the Order value and it returns the travel time including traffic for that time of the day.
It seems to me that the parameter values are read once and cached when the Solve function is called, am I correct? If yes, does anybody have any recommendations to solve this problem?
Originally I asked this question on the MSF forum but I thought it would get more attention on Stack Overflow. Also I'm open to different solvers other than MSF but I'd prefer to stay in the .NET environment.
There is a good article on solving the "static" Traveling Salesman Problem using Solver Foundation here. If you do not already have your own implementation, maybe you can base your solution on that code.
This is the goal formulation from the mentioned article:
// Goal: minimize the length of the tour.
Goal goal = model.AddGoal("TourLength", GoalKind.Minimize,
Model.Sum(Model.ForEach(city, i =>
Model.ForEachWhere(city,
j => dist[i, j] * assign[i, j], j => i != j))));
If I understand correctly, in your problem the time to travel between two cities is dependent on the time-of-the-day?
I do not believe you can dynamically update the dist[,] double array during the optimization. However, using the building blocks of the Model class it should be possible to reformulate the dist[,] array as a set of functions that are dependent on the total distance/time already traveled.
For completeness, here is another interesting article on TSP formulation using OML.
Related
I'm classifing users with a multiclass svm (one-against-on), 3 classes. In binary, I would be able to plot the distribution of the weight of each feature in the hyperplan equation for different training sets. In this case, I don't really need a PCA to see stability of the hyperplan and relative importance of the features (reudced centered btw). What would the alternative be in multiclass svm, as for each training set you have 3 classifiers and you choose one class according to the result of the three classifiers (what is it already ? the class that appears the maximum number of times or the bigger discriminant ? whichever it does not really matter here). Anyone has an idea.
And if it matters, I am writing in C# with Accord.
Thank you !
In a multi-class SVM that uses the one-vs-one strategy, the problem is divided into a set of smaller binary problems. For example, if you have three possible classes, using the one-vs-one strategy requires the creation of (n(n-1))/n binary classifiers. In your example, this would be
(n(n-1))/n = (3(3-1))/2 = (3*2)/2 = 3
Each of those will be specialized in the following problems:
Distinguishing between class 1 and class 2 (let's call it svma).
Distinguishing between class 1 and class 3 (let's call it svmb)
Distinguishing between class 2 and class 3 (let's call it svmc)
Now, I see that actually you have asked multiple questions in your original post, so I will ask them separately. First I will clarify how the decision process works, and then tell how you could detect which features are the most important.
Since you mentioned Accord.NET, there are two ways this framework might be computing the multi-class decision. The default one is to use a Decision Directed Acyclic Graph (DDAG), that is nothing more but the sequential elimination of classes. The other way is by solving all binary problems and taking the class that won most of the time. You can configure them at the moment you are classifying a new sample by setting the method parameter of the SVM's Compute method.
Since the winning-most-of-the-time version is straightforward to understand, I will explain a little more about the default approach, the DDAG.
Decision using directed acyclic graphs
In this algorithm, we test each of the SVMs and eliminate the class that lost at each round. So for example, the algorithm starts with all possible classes:
Candidate classes: [1, 2, 3]
Now it asks svma to classify x, it decides for class 2. Therefore, class 1 lost and is not considered anymore in further tests:
Candidate classes: [2, 3]
Now it asks svmb to classify x, it decides for class 2. Therefore, class 3 lost and is not considered anymore in further tests:
Candidate classes: [2]
The final answer is thus 2.
Detecting which features are the most useful
Now, since the one-vs-one SVM is decomposed into (n(n-1)/2) binary problems, the most straightforward way to analyse which features are the most important is by considering each binary problem separately. Unfortunately it might be tricky to globally determine which are the most important for the entire problem, but it will be possible to detect which ones are the most important to discriminate between class 1 and 2, or class 1 and 3, or class 2 and 3.
However, here I can offer a suggestion if your are using DDAGs. Using DDAGs, it is possible to extract the decision path that lead to a particular decision. This means that is it possible to estimate how many times each of the binary machines was used when classifying your entire database. If you can estimate the importance of a feature for each of the binary machines, and estimate how many times a machine is used during the decision process in your database, perhaps you could take their weighted sum as an indicator of how useful a feature is in your decision process.
By the way, you might also be interested in trying one of the Logistic Regression Support Vector Machines using L1-regularization with a high C to perform sparse feature selection:
// Create a new linear machine
var svm = new SupportVectorMachine(inputs: 2);
// Creates a new instance of the sparse logistic learning algorithm
var smo = new ProbabilisticCoordinateDescent(svm, inputs, outputs)
{
// Set learning parameters
Complexity = 100,
};
I'm not an expert in ML or SVM. I am a self learner. However my prototype over-performed some of similar commercial or academical software in accuracy, while the training time is about 2 hours in contrast of days and weeks(!) of some competitors.
My recognition system (patterns in bio-cells) uses following approach to select best features:
1)Extract features and calculate mean and variance for all classes
2)Select those features, where means of classes are most distanced and variances are minimal.
3)Remove redundant features - those which mean-histograms over classes are similar
In my prototype I'm using parametric features e.g feature "circle" with parameters diameter, threshold, etc.
The training is controlled by scripts defining which features with which argument-ranges are to use. So the software tests all possible combinations.
For some training-time optimization:
The software begins with 5 instances per class for extracting the features and increases the number when the 2nd condition met.
Probably there are some academical names for some of the steps. Unfortunately I'm not aware of them, I've "invented the wheel" myself.
I've been messing around with the aforge time series genetic algorithm sample and I've got my own version working, atm it's just 'predicting' Fibonacci numbers.
The problem is when I ask it to predict new values beyond the array I've given it (which contains the first 21 numbers of the sequence, using a window size of 5) it won't do it, it throws an exception that says "Data size should be enough for window and prediction".
As far as I can tell I'm supposed to decipher the bizarre formula contained in "population.BestChromosome" and use that to extrapolate future values, is that right? Is there an easier way? Am I overlooking something massively obvious?
I'd ask on the aforge forum but the developer is not supporting it anymore.
As far as I can tell I'm supposed to decipher the bizarre formula
contained in "population.BestChromosome" and use that to extrapolate
future values, is that right?
What you call a "bizarre formula" is called a model in data analysis. You learn such a model from past data and you can feed it new data to get a predicted outcome. Whether that new outcome makes sense or is just garbage depends on how general your model is. Many techniques can learn very good models that explain the observed data very well, but which are not generalizable and will return unuseful results when you feed new data into the model. You need to find a model that both explains the given data as well as potentially unobserved data which is a non-trivial process. Usually people estimate the generalization error of that model by splitting the known data into two partitions: one with which the model is learned and another one on which the learned models are tested. You then want to select that model which is accurate on both data. You can also check out the answer I gave on another question here which also treats the topic of machine learning: https://stackoverflow.com/a/3764893/189767
I don't think you're "overlooking something massively obvious", but rather you're faced with a problem that is not trivial to solve.
Btw, you can also use genetic programming (GP) in HeuristicLab. The model of GP is a mathematical formula and in HeuristicLab you can export that model to e.g. MatLab.
Ad Fibonacci, the closed formula for Fibonacci numbers is F(n) = (phi^n - psi^n) / sqrt(5) where phi and psi are special magic numbers according to wikipedia. If you want to find that with GP you need one variable (n), three constants, and the power function. However, it's very likely that you find a vastly different formula that is similar in output. The problem in machine learning is that very different models can produce the same output. The recursive form requires that you include the values of the past two n into the data set. This is similar to learning a model for a time series regression problem.
I'm working on a new project.
My best analogy would be a psychological evaluation test maker.
Aspect #1.
The end-business-user needs to create test questions. With question types. And possible responses to the questions when applicable.
Examples:
1. Do you have red hair? (T/F)
2. What is your favorite color? (Single Response/Multiple Choice)
(Possible Responses: Red, Yellow, Black, White)
3. What size shoe do you wear (rounded to next highest size)? (Integer)
4. How much money do you have on you right now? (Dollar Amount (decimal))
So I need to be able to create questions, their question type, and for some of the questions, possible answers.
Here:
Number 1 is a know type of "True or False".
Number 2 is a know type of "Single Response/Multiple Choice" AND the end-business-user will create the possible responses.
Number 3 is a known type of "Integer". The end-user (person taking the evaluation) can basically put in any integer value.
Number 4 is a known type of "Decimal". Same thing as Integer.
Aspect #2.
The end-business-user needs to evaluate the person's responses. And assign some scalar value to a set of responses.
Example:
If someone responded:
1. T
2. Red
3. >=8
4. (Not considered for this situation)
Some psychiatrist-expert figures out that if someone answered with the above responses, that you are a 85% more at risk for depression than the normal. (Where 85% is a number the end-business-user (psychiatrist) can enter as a parameter.
So Aspect #2 is running through someone's responses, and determining a result.
The "response grid" would have to be setup so that it will go through (some or all) the possibilities in a priority ranking order, and then after all conditions are met (on a single row), exit out with the result.
Like this:
(1.) T (2.) Red (3.) >=8 ... Result = 85%
(1.) T (2.) Yellow (3.) >=8 ... Result = 70%
(1.) F (2.) Red (3.) >=8 ... Result = 50%
(1.) F (2.) Yellow (3.) >=8 ... Result = 40%
Once a match is found, you exit with the percentage. If you don't find a match, you go to the next rule.
Also, running with this psych evaluation mock example, I don't need to define every permutation. A lot of questions of psych evaluations are not actually used, they are just "fluff". So in my grid above, I have purposely left out question #4. It has no bearing on the results.
There can also be a "Did this person take this seriously?" evaluation grid:
(3.) >=20 (4.) >=$1,000 ... Result = False
(The possibility of having a shoe size >= 20 and having big dollars in your pocket is very low, thus you probably did not take the psych test seriously.)
If no rule is found, (in my real application, not this mock up), I would throw an exception or just not care. I would not need a default or fall-through rule.
In the above, Red and Yellow are "worrisome" favorite colors. If your favorite color is black or white, it has no bearing upon your depression risk factor.
I have used Business Rule Engines in the past. (InRule for example).
They are very expensive, and it is not in the budget.
BizTalk Business Rules Framework is a possibility. Not de$irable, but possible.
My issue with any Rules-Engine is that the "vocabulary" (I have limited experience with business rules engines, mind you) is based off of objects, with static properties.
public class Employee
{
public string LastName
{ get; set; }
public DateTime HireDate
{ get; set; }
public decimal AnnualSalary
{ get; set; }
public void AdjustSalary(int percentage)
{
this.AdjustSalary= this.AdjustSalary + (this.AdjustSalary * percentage);
}
}
This would be easy to create business rules.
If
the (Employee's HireDate) is before (10 years ago)
then
(Increase Their Salary) By (5) Percent.)
else
(Increase Their Salary) By (3) Percent.)
But in my situation, the Test is composed of (dynamic) Questions and (dynamic) Responses, not predetermined properties.
So I guess I'm looking for some ideas to investigate on how to pull this off.
I know I can build a "TestMaker" application fairly quickly.
The biggest issue is integrating the Questions and (Possible Responses) into "evaluating rules".
Thanks for any tips.
Technologies:
DotNet 4.0 Framework
Sql Server 2008 Backend Database
VS2010 Pro, C#
If it's a small application, i.e 10s of users as opposed to 1000s, and its not business critical, then I would recommend Excel.
The advantages are, most business users are very familiar with excel and most probably have it on their machines. Basically you ship an Excel Workbook to your business users. They would enter the questions, the Type (Decimal, True False etc.). Click a button which triggers an excel macro. This could generate an XML configuration file or put the questions into SQL. Your application just reads it, displays it and collects responses as usual.
The main advantage of Excel comes in Aspect #2, the dynamic user chosen business rules. In another sheet of the same Excel document, the end business user can specify as many of the permutations of the responses/questions as they feel like. Excel users are very familiar with inputting simple formulas like =If(A1 > 20 && A2 <50) etc. Again the user pushes a button and generates either a configuration file or input the data into SQL server.
In your application you iterate through the Rules table and apply it to the responses.
Given the number of users/surveys etc. This simple solution would be much more simpler than biztalk or a full on customs rules engine. Biztalk would be good if you need to talk to multiple system, integrate their data, enforce rules, create work flow etc. Most of the other rules engines are also geared towards really big, complex rule system. The complexity in this rule systems, isn't just the number of rules or permutations, it is mostly having to talk to multiple system or "End dating" certain rules or putting in rules for future kick off dates etc.
In your case an Excel based or a similar datagrid on a webpage system would work fine. Unless of course you are doing a project for Gartner or some other global data survey firm or a major government statistical organisation. In that case I would suggest Biztalk or other commercial rules engines.
In your case, its SQL tables with Questions, Answer Types, Rules To Apply. Input is made user friendly via Excel or "Excel in the web" via a datagrid and then just iterate through the rules and apply them to the response table.
Are you sure you want to use a business rule engine for this problem?
As far as I understand it, the usecase for a BRE is
Mostly static execution flow
Some decisions do change often
In your usecase, the whole system (Q/A-flow and evaluation) are dynamic, so IMHO a simple domain specific language for the whole system would be a better solution.
You might want to take some inspiration from testMaker - a web based software exactly for this workflow. (Disclaimer: I contributed a bit to this project.) Its scoring rules are quite basic, so this might not help you that much. It was designed to export data to SPSS and build reports from there...
Be sure you are modeling a database suitable for hierarchical objects This article would help
create table for test
create tables for questions, with columns question, testid, questiontype
create tables for answers, answer id,question id, answer and istrue columns
answers belong to questions
one question can have many answer
create table for user or employee
create table for user answers, answer id, selected answer
evaluation (use object variables for boolean-integer coverage, use try catch before using function for high exception coverage.):
function(questiontype,answer,useranswer)
switch(questiontype) //condition can be truefalse, biggerthan,smallerthan, equals
{
case: "biggerthan": if(useranswer>answer) return true else return false;
case "truefalse": if(useranswer==answer) return true else return false
case "equals": if(useranswer==answer) return true else return false
}
get output as a data dictionary and post here please.
without a data schema the help you get will be limited.
I am trying to use SVM for News article classification.
I created a table that contains the features (unique words found in the documents) as rows.
I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1 or else 0.
Ex:- Training sample generated...
1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1
10:1 11:1 12:1 13:1 14:1 15:1 16:1
17:1 18:1 19:1 20:1 21:1 22:1 23:1
24:1 25:1 26:1 27:1 28:1 29:1 30:1
As this is the first document all the features are present.
I am using 1, 0 as class labels.
I am using svm.Net for classification.
I gave 300 weight vectors manually classified as training data and the model generated is taking all the vectors as support vectors, which is surely overfitting.
My total features (unique words/row count in feature vector DB table) is 7610.
What could be the reason?
Because of this over fitting my project is now in pretty bad shape. It is classifying every article available as a positive article.
In LibSVM binary classification is there any restriction on the class label?
I am using 0, 1 instead of -1 and +1. Is that a problem?
You need to do some type of parameter search, also if the classes are unbalanced the classifier might get artificially high accuracies without doing much. This guide is good at teaching basic, practical things, you should probably read it
As pointed out, a parameter search is probably a good idea before doing anything else.
I would also investigate the different kernels available to you. The fact that you input data is binary might be problematic for the RBF kernel (or might render it's usage sub-optimal, compared to another kernel). I have no idea which kernel could be better suited, though. Try a linear kernel, and look around for more suggestions/idea :)
For more information and perhaps better answers, look on stats.stackexchange.com.
I would definitely try using -1 and +1 for your labels, that's the standard way to do it.
Also, how much data do you have? Since you're working in 7610-dimensional space, you could potentially have that many support vectors, where a different vector is "supporting" the hyperplane in each dimension.
With that many features, you might want to try some type of feature selection method like principle component analysis.
I know there are quite some questions out there on generating combinations of elements, but I think this one has a certain twist to be worth a new question:
For a pet proejct of mine I've to pre-compute a lot of state to improve the runtime behavior of the application later. One of the steps I struggle with is this:
Given N tuples of two integers (lets call them points from here on, although they aren't in my use case. They roughly are X/Y related, though) I need to compute all valid combinations for a given rule.
The rule might be something like
"Every point included excludes every other point with the same X coordinate"
"Every point included excludes every other point with an odd X coordinate"
I hope and expect that this fact leads to an improvement in the selection process, but my math skills are just being resurrected as I type and I'm unable to come up with an elegant algorithm.
The set of points (N) starts small, but outgrows 64 soon (for the "use long as bitmask" solutions)
I'm doing this in C#, but solutions in any language should be fine if it explains the underlying idea
Thanks.
Update in response to Vlad's answer:
Maybe my idea to generalize the question was a bad one. My rules above were invented on the fly and just placeholders. One realistic rule would look like this:
"Every point included excludes every other point in the triagle above the chosen point"
By that rule and by choosing (2,1) I'd exclude
(2,2) - directly above
(1,3) (2,3) (3,3) - next line
and so on
So the rules are fixed, not general. They are unfortunately more complex than the X/Y samples I initially gave.
How about "the x coordinate of every point included is the exact sum of some subset of the y coordinates of the other included points". If you can come up with a fast algorithm for that simply-stated constraint problem then you will become very famous indeed.
My point being that the problem as stated is so vague as to admit NP-complete or NP-hard problems. Constraint optimization problems are incredibly hard; if you cannot put extremely tight bounds on the problem then it very rapidly becomes not analyzable by machines in polynomial time.
For some special rule types your task seems to be simple. For example, for your example rule #1 you need to choose a subset of all possible values of X, and than for each value from the subset assign an arbitrary Y.
For generic rules I doubt that it's possible to build an efficient algorithm without any AI.
My understanding of the problem is: Given a method bool property( Point x ) const, find all points the set for which property() is true. Is that reasonable?
The brute-force approach is to run all the points through property(), and store the ones which return true. The time complexity of this would be O( N ) where (a) N is the total number of points, and (b) the property() method is O( 1 ). I guess you are looking for improvements from O( N ). Is that right?
For certain kind of properties, it is possible to improve from O( N ) provided suitable data structure is used to store the points and suitable pre-computation (e.g. sorting) is done. However, this may not be true for any arbitrary property.