Random Forest trains but tree count is zero?

Random Forest trains but tree count is zero? - c#

I'm using emguCV to use OpenCV machine learning algorithms. I can successfully train a RTree(i get success) but when i try to predict it gives me always -1. Then i tried to get the Variable Importance matrix and the tree count and the matrix comes as null (i specified the params to built it) and the tree count comes as 0.
Does anyone has any thoughts on what i'm doing wrong? PS, if i use a decision tree i can get predictions.
I have 6 variables and about 11000 samples.
Below are the parameters i use:
MCvRTParams param = new MCvRTParams();
param.maxDepth = 8;// max depth
param.minSampleCount = 10;// min sample count
param.regressionAccuracy = 0;// regression accuracy: N/A here
param.useSurrogates = true; //compute surrogate split, no missing data
param.maxCategories = 15;// max number of categories (use sub-optimal algorithm for larger numbers)
param.cvFolds = 10;
//param.use1seRule = true;
param.truncatePrunedTree = true;
//param.priors = priorsHandle.AddrOfPinnedObject(); // the array of priors
Thanks

Try setting regressionAccuracy to a non-zero number. regressionAccuracy stops splitting nodes, if the accuracy within a node is better than regressionAccuracy. If you set it to zero it will stop immediately at the root node.

Related

Compare if elements are almost equal in a list in C# .NET

I am still very beginner in C# and .NET and just need to do this simple test.
var odds = new System.Collections.Generic.List<double>();
// here is a code which adds the values in the list
foreach(var odd in odds)
{
System.Console.WriteLine(odd);
}
and the output is something like that:
13.098252624859418
14.098252624859349
13.098252624859577
13.098252624853423
14.098252624859398
So I would like to compare all the values inside the list if they are almost equal. That means even if there is a little difference between the numbers (such as 13 and 14) inside the list still to be acceptable so I would like this difference to be maximum of 2.

Check the difference between the maximum value and the minimum value in the list (2 in your case). Using a tolerance value. For example
double delta = 2;
// getting largest element
var maxNum = odds.Max();
// getting smallest element
var minNum = odds.Min();
var almostEqual = maxNum - minNum <= delta;

You'll need to do it manually, as is recommended with every floating point number comparison (because floating point math is unintuitive), doing that is quite simple, something like this:
var a = 13.098252624859418;
var b = 14.098252624859398;
// define your acceptable range, i.e 1.0 means number 1.0 larger and smaller are equal to one another
var delta = 1.0;
var areNearlyEqual = Math.Abs(a - b) <= delta; // true
Now if you want to check if every element in a List is nearly equal to every other element, there is a naïve and more "complicated" solution, I'll start with the naïve one:
(Don't actually use this implementation, this is for illustration purposes of how to check equality of all items in a list which aren't just numbers)
var allAreNearlyEqual = true; // Let's start of assuming all are equal
foreach (var x in odds)
{
if (!allAreNearlyEqual)
break;
foreach (var y in odds)
{
if (!Math.Abs(x - y) <= delta)
allAreNearlyEqual = false;
}
}
Console.WriteLine(allAreNearlyEqual);
As you can see we need to iterate over every element in the list (x) and compare it to every other element in the list (y), there is an easier to read (and also faster*) version of this:
var max = odds.Max();
var min = odds.Min();
if (Math.Abs(max - min) <= delta)
Console.WriteLine("All items are nearly equal");
else
Console.WriteLine("Not all items are nearly equal");
(This takes advantage of the fact that all other elements between the min and max are also close enough to be nearly equal, if the min and max are)
You can check out the implementation for Max here to see how they do it, but basically it's just a foreach loop which returns the highest value found.
*The second version is faster, because it's O(2N) where as the first version is O(N^2), I added the first version to illustrate how you could do the same thing on a list of objects which are not just numbers

Algorithm to find first sum of an array within a range

I'm have a fairly complicated (to me) algorithm that I'm trying to write. The idea is to determine which elements in an array are the first ones to sum up to a value that falls within a range.
For example:
I have an array [1, 15, 25, 22, 25] that is in a prioritized order.
I want to find the first set of values with the most elements that sum within a minimum and maximum range, not necessarily the set that get me closest to my max.
So, if the min is 1 and max is 25, I would select [0(1), 1(15)] even though the third element [2(25)] is closer to my max of 25 because those come first.
If the min is 25 and max is 40, I would select [0(1), 1(15), 3(22)], skipping the third element since that would breach the max.
If the min is 50 and max is 50, I would select [2(25), 4(25)] since those are the only two that can meet the min and max requirements.
Are there any common CS algorithms that match this pattern?

This is a dynamic programming problem.
You want to build a data structure to answer the following question.
by next to last position available in the array:
by target sum:
(elements in sum, last position used)
When it finds a target_sum in range, you just read back through it to get the answer.
Here is pseudocode for that. I used slightly Pythonish syntax and JSON to represent the data structure. Your code will be longer:
Initialize the lookup to [{0: (0, null)}]
for i in 1..(length of array):
# Build up our dynamic programming data structure
Add empty mapping {} to end of lookup
best_sum = null
best_elements = null
for prev_sum, prev_elements, prev_position in lookup for i-1:
# Try not using this element
if prev_sum not in lookup[i] or lookup[i][prev_sum][0] < prev_elements:
lookup[i][prev_sum] = (prev_elements, prev_position)
# Try using this element
next_sum = prev_sum + array[i-1]
next_elements = prev_elements + 1
prev_position = i-1
if next_sum not in lookup lookup[i][next_sum][0] < prev_elements:
lookup[i][next_sum] = (next_elements, next_position)
if next_sum in desired range:
if best_elements is null or best_elements < this_elements
best_elements = this_elements
best_sum = this_sum
if best_elements is not null:
# Read out the answer!
answer = []
j = i
while j is not null:
best_sum = lookup[j][0]
answer.append(array[j])
j = lookup[j][1]
return reversed(answer)
This will return the desired values rather than the indexes. To switch, just reverse what goes into the answer.

.NET equivalent of Java's TreeSet.floor & TreeSet.ceiling

As an example, there's a Binary Search Tree which holds a range of values. Before adding a new value, I need to check if it already contains it's 'almost duplicate'. I have Java solution which simply performs floor and ceiling and further condition to do the job.
JAVA: Given a TreeSet, floor() returns the greatest element in this set less than or equal to the given element; ceiling() returns the least element in this set greater than or equal to the given element
TreeSet<Long> set = new TreeSet<>();
long l = (long)1; // anything
Long floor = set.floor(l);
Long ceil = set.ceiling(l);
C#: Closest data structure seems to be SortedSet<>. Could anyone advise the best way to get floor and ceil results for an input value?
SortedSet<long> set = new SortedSet<long>();

The above, as mentioned, is not the answer since this is a tree we expect logarithmic times. Java's floor and ceiling methods are logarithmic. GetViewBetween is logarigmic and so are Max and Min, so:
floor for SortedSet<long>:
sortedSet.GetViewBetween(long.MinValue, num).Max
ceiling for SortedSet<long>:
sortedSet.GetViewBetween(num, long.MaxValue).Min

You can use something like this. In Linq there is LastOrDefault method:
var floor = sortedSet.LastOrDefault(i => i < num);
// num is the number whose floor is to be calculated
if (! (floor < sortedSet.ElementAt(0)))
{
// we have a floor
}
else
// nothing is smaller in the set
{
}

Best way to group list of doubles, to add up to a specific value

I have task, that I'm unsure on how i should approach.
there's a list of doubles, and i need to group them together to add up to a specific value.
Say i have:
14.6666666666666,
14.6666666666666,
2.37499999999999,
1.04166666666665,
1.20833333333334,
1.20833333333334,
13.9583333333333,
1.20833333333334,
3.41666666666714,
3.41666666666714,
1.20833333333334,
1.20833333333334,
14.5416666666666,
1.20833333333335,
1.04166666666666,
And i would like to group into set values such as 12,14,16
I would like to take the highest value in the list then group it with short ones to equal the closest value above.
example:
take double 14.6666666666666, and group it with 1.20833333333334 to bring me close to 16, and if there are anymore small doubles left in the list, group them with that as well.
Then move on to the next double in the list..

That's literally the "Cutting stock Problem" (Sometimes called the 1 Dimensional Bin Packing Problem). There are a number of well documented solutions.
The only way to get the "Optimal" solution (other than a quantum computer) is to cycle through every combination, and select the best outcome.
A quicker way to get an "OK" solution is called the "First Fit Algorithm". It takes the requested values in the order they come, and removes them from the first piece of material that can fulfill the request.
The "First Fit Algorithm" can be slightly improved by pre-ordering the the values from largest to smallest, and pre-ordering the materials from smallest to largest. You could also uses the material that is closest to being completely consumed by the request, instead of the first piece that can fulfill the request.
A compromise, but one that requires more code is a "Genetic Algorithm". This is an over simplification, but you could use the basic idea of the "First Fit Algorithm", but randomly swap two of the values before each pass. If the efficiency increases, you keep the change, and if it decreases, you go back to the last state. Repeat until a fixed amount of time has passed or until you're happy.

Put the doubles in a list and sort them. Grab the highest value that is less than the target to start. Then loop through from the start of the list adding the values until you reach a point where adding the value will put you over the limit.
var threshold = 16;
List<double> values = new List<double>();
values.Add(14.932034);
etc...
Sort the list:
values = values.OrderBy(p => p).ToList();
Grab the highest value that is less than your threshold:
// Highest value under threshold
var highestValue = values.Where(x => x < threshold).Max();
Now perform your search and calculations until you reach your solution:
currentValue = highestValue
Console.WriteLine("Starting with: " + currentValue);
foreach(var val in values)
{
if(currentValue + val <= theshold)
{
currentValue = currentValue + val;
Console.WriteLine(" + " + val.ToString());
}
else
break;
}
Console.WriteLine("Finished with: " + currentValue.ToString());
Console.ReadLine();
Repeat the process for the next value and so on until you've output all of the solutions you want.

How to transform one list to another semi-aggregated list via LINQ?

I'm a Linq beginner so just looking for someone to let me know if following is possible to implement with Linq and if so some pointers how it could be achieved.
I want to transform one financial time series list into another where the second series list will be same length or shorter than the first list (usually it will be shorter, i.e., it becomes a new list where the elements themselves represent aggregation of information of one or more elements from the 1st list). How it collapses the list from one to the other depends on the data in the first list. The algorithm needs to track a calculation that gets reset upon new elements added to second list. It may be easier to describe via an example:
List 1 (time ordered from beginning to end series of closing prices and volume):
{P=7,V=1}, {P=10,V=2}, {P=10,V=1}, {P=10,V=3}, {P=11,V=5}, {P=12,V=1}, {P=13,V=2}, {P=17,V=1}, {P=15,V=4}, {P=14,V=10}, {P=14,V=8}, {P=10,V=2}, {P=9,V=3}, {P=8,V=1}
List 2 (series of open/close price ranges and summation of volume for such range period using these 2 param settings to transform list 1 to list 2: param 1: Price Range Step Size = 3, param 2: Price Range Reversal Step Size = 6):
{O=7,C=10,V=1+2+1}, {O=10,C=13,V=3+5+1+2}, {O=13,C=16,V=0}, {O=16,C=10,V=1+4+10+8+2}, {O=10,C=8,V=3+1}
In list 2, I explicitly am showing the summation of the V attributes from list 1 in list 2. But V is just a long so it would just be one number in reality. So how this works is opening time series price is 7. Then we are looking for first price from this initial starting price where delta is 3 away from 7 (via param 1 setting). In list 1, as we move thru the list, the next step is upwards move to 10 and thus we've established an "up trend". So now we build our first element in list 2 with Open=7,Close=10 and sum up the Volume of all bars used in first list to get to this first step in list 2. Now, next element starting point is 10. To build another up step, we need to advance another 3 upwards to create another up step or we could reverse and go downwards 6 (param 2). With data from list 1, we reach 13 first, so that builds our second element in list 2 and sums up all the V attributes used to get to this step. We continue on this process until end of list 1 processing.
Note the gap jump that happens in list 1. We still want to create a step element of {O=13,C=16,V=0}. The V of 0 is simply stating that we have a range move that went thru this step but had Volume of 0 (no actual prices from list 1 occurred here - it was above it but we want to build the set of steps that lead to price that was above it).
Second to last entry in list 2 represents the reversal from up to down.
Final entry in list 2 just uses final Close from list 1 even though it really hasn't finished establishing full range step yet.
Thanks for any pointers of how this could be potentially done via Linq if at all.

My first thought is, why try to use LINQ on this? It seems like a better situation for making a new Enumerable using the yield keyword to partially process and then spit out an answer.
Something along the lines of this:
public struct PricePoint
{
ulong price;
ulong volume;
}
public struct RangePoint
{
ulong open;
ulong close;
ulong volume;
}
public static IEnumerable<RangePoint> calculateRanges(IEnumerable<PricePoint> pricePoints)
{
if (pricePoints.Count() > 0)
{
ulong open = pricePoints.First().price;
ulong volume = pricePoints.First().volume;
foreach(PricePoint pricePoint in pricePoints.Skip(1))
{
volume += pricePoint.volume;
if (pricePoint.price > open)
{
if ((pricePoint.price - open) >= STEP)
{
// We have established a up-trend.
RangePoint rangePoint;
rangePoint.open = open;
rangePoint.close = close;
rangePoint.volume = volume;
open = pricePoint.price;
volume = 0;
yield return rangePoint;
}
}
else
{
if ((open - pricePoint.price) >= REVERSAL_STEP)
{
// We have established a reversal.
RangePoint rangePoint;
rangePoint.open = open;
rangePoint.close = pricePoint.price;
rangePoint.volume = volume;
open = pricePoint.price;
volume = 0;
yield return rangePoint;
}
}
}
RangePoint lastPoint;
lastPoint.open = open;
lastPoint.close = pricePoints.Last().price;
lastPoint.volume = volume;
yield return lastPoint;
}
}
This isn't yet complete. For instance, it doesn't handle gapping, and there is an unhandled edge case where the last data point might be consumed, but it will still process a "lastPoint". But it should be enough to get started.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Random Forest trains but tree count is zero? - c#

Try setting regressionAccuracy to a non-zero number. regressionAccuracy stops splitting nodes, if the accuracy within a node is better than regressionAccuracy. If you set it to zero it will stop immediately at the root node.

Related

Compare if elements are almost equal in a list in C# .NET

Algorithm to find first sum of an array within a range

.NET equivalent of Java's TreeSet.floor & TreeSet.ceiling

Best way to group list of doubles, to add up to a specific value

How to transform one list to another semi-aggregated list via LINQ?

Categories

Resources