Difficulty with Dijkstra's algorithm - c#

I am new to advanced algorithms, so please bear with me. I am currently trying to get Dijkstra's algorithm to work, and have spend 2 days trying to figure this out. I also read the pseudo-code on Wikipedia and got this far. I want to get the shortest distance between two vertices. In the below sample, I keep getting the wrong distance. Please help?
Sample graph setup is as follow:
Graph graph = new Graph();
graph.Vertices.Add(new Vertex("A"));
graph.Vertices.Add(new Vertex("B"));
graph.Vertices.Add(new Vertex("C"));
graph.Vertices.Add(new Vertex("D"));
graph.Vertices.Add(new Vertex("E"));
graph.Edges.Add(new Edge
{
From = graph.Vertices.FindVertexByName("A"),
To = graph.Vertices.FindVertexByName("B"),
Weight = 5
});
graph.Edges.Add(new Edge
{
From = graph.Vertices.FindVertexByName("B"),
To = graph.Vertices.FindVertexByName("C"),
Weight = 4
});
graph.Edges.Add(new Edge
{
From = graph.Vertices.FindVertexByName("C"),
To = graph.Vertices.FindVertexByName("D"),
Weight = 8
});
graph.Edges.Add(new Edge
{
From = graph.Vertices.FindVertexByName("D"),
To = graph.Vertices.FindVertexByName("C"),
Weight = 8
});
//graph is passed as param with source and dest vertices
public int Run(Graph graph, Vertex source, Vertex destvertex)
{
Vertex current = source;
List<Vertex> queue = new List<Vertex>();
foreach (var vertex in graph.Vertices)
{
vertex.Weight = int.MaxValue;
vertex.PreviousVertex = null;
vertex.State = VertexStates.UnVisited;
queue.Add(vertex);
}
current = graph.Vertices.FindVertexByName(current.Name);
current.Weight = 0;
queue.Add(current);
while (queue.Count > 0)
{
Vertex minDistance = queue.OrderBy(o => o.Weight).FirstOrDefault();
queue.Remove(minDistance);
if (current.Weight == int.MaxValue)
{
break;
}
minDistance.Neighbors = graph.GetVerTextNeigbours(current);
foreach (Vertex neighbour in minDistance.Neighbors)
{
Edge edge = graph.Edges.FindEdgeByStartingAndEndVertex(minDistance, neighbour);
int dist = minDistance.Weight + (edge.Weight);
if (dist < neighbour.Weight)
{
//from this point onwards i get stuck
neighbour.Weight = dist;
neighbour.PreviousVertex = minDistance;
queue.Remove(neighbour);
queueVisited.Enqueue(neighbor);
}
}
minDistance.State = VertexStates.Visited;
}
//here i want to record all node that was visited
while (queueVisited.Count > 0)
{
Vertex temp = queueVisited.Dequeue();
count += temp.Neighbors.Sum(x => x.Weight);
}
return count;
}

There are several immediate issues with the above code.
The current variable is never re-assigned, nor is the object mutatd, so graph.GetVerTextNeigbours(current) always returning the same set.
As a result, this will never even find the target vertex if more than one edge must be traversed.
The neighbor node is removed from the queue via queue.Remove(neighbour). This is incorrect and can prevent a node/vertex from being [re]explored. Instead, it should just have the weight updated in the queue.
In this case the "decrease-key v in Q" is the psuedo-code results in a NO-OP in the implementation due object-mutation and not using a priority queue (i.e. sorted) structure.
The visited queue, as per queueVisited.Enqueue(neighbor) and the later "sum of all neighbor weights of all visited nodes", is incorrect.
After the algorithm runs, the cost (Weight) of any node/vertex that was explored is the cost of the shortest path to that vertex. Thus, you can trivially ask the goal vertex what the cost is.
In addition, for the sake of the algorithm simplicity, I recommend giving up the "Graph/Vertex/Edge" layout initially created and, instead, pass in a simple Node. This Node can enumerate its neighbor Nodes, knows the weights to each neighbor, its own current weight, and the best-cost node to it.
Simply build this Node graph once, at the start, then pass the starting Node. This eliminates (or greatly simplifies) calls like GetVerTextNeighbors and FindEdgeByStartingAndEndVertex, and strange/unnecessary side-effects like assigning Neighbors each time the algorithm visits the vertex.

Related

Shortest list from a two dimensional array

This question is more about an algorithm than actual code, but example code would be appreciated.
Let's say I have a two-dimensional array such as this:
A B C D E
--------------
1 | 0 2 3 4 5
2 | 1 2 4 5 6
3 | 1 3 4 5 6
4 | 2 3 4 5 6
5 | 1 2 3 4 5
I am trying to find the shortest list that would include a value from each row. Currently, I am going row by row and column by column, adding each value to a SortedSet and then checking the length of the set against the shortest set found so far. For example:
Adding cells {1A, 2A, 3A, 4A, 5A} would add the values {0, 1, 1, 2, 1} which would result in a sorted set {0, 1, 2}. {1B, 2A, 3A, 4A, 5A} would add the values {2, 1, 1, 2, 1} which would result in a sorted set {1, 2}, which is shorter than the previous set.
Obviously, adding {1D, 2C, 3C, 4C, 5D} or {1E, 2D, 3D, 4D, 5E} would be the shortest sets, having only one item each, and I could use either one.
I don't have to include every number in the array. I just need to find the shortest set while including at least one number from every row.
Keep in mind that this is just an example array, and the arrays that I'm using are much, much larger. The smallest is 495x28. Brute force will take a VERY long time (28^495 passes). Is there a shortcut that someone knows, to find this in the least number of passes? I have C# code, but it's kind of long.
Edit:
Posting current code, as per request:
// Set an array of counters, Add enough to create largest initial array
int ListsCount = MatrixResults.Count();
int[] Counters = new int[ListsCount];
SortedSet<long> CurrentSet = new SortedSet<long>();
for (long X = 0; X < ListsCount; X++)
{
Counters[X] = 0;
CurrentSet.Add(X);
}
while (true)
{
// Compile sequence list from MatrixResults[]
SortedSet<long> ThisSet = new SortedSet<long>();
for (int X = 0; X < Count4; X ++)
{
ThisSet.Add(MatrixResults[X][Counters[X]]);
}
// if Sequence Length less than current low, set ThisSet as Current
if (ThisSet.Count() < CurrentSet.Count())
{
CurrentSet.Clear();
long[] TSI = ThisSet.ToArray();
for (int Y = 0; Y < ThisSet.Count(); Y ++)
{
CurrentSet.Add(TSI[Y]);
}
}
// Increment Counters
int Index = 0;
bool EndReached = false;
while (true)
{
Counters[Index]++;
if (Counters[Index] < MatrixResults[Index].Count()) break;
Counters[Index] = 0;
Index++;
if (Index >= ListsCount)
{
EndReached = true;
break;
}
Counters[Index]++;
}
// If all counters are fully incremented, then break
if (EndReached) break;
}
With all computations there is always a tradeoff, several factors are in play, like will You get paid for getting it perfect (in this case for me, no). This is a case of the best being the enemy of the good. How long can we spend on solving a problem and will it be sufficient to get close enough to fulfil the use case (imo) and when we can solve the problem without hand painting pixels in UHD resolution to get the idea of a key through, lets!
So, my choice is an approach which will get a covering set which is small and ehem... sometimes will be the smallest :) In essence because of the sequence in comparing would to be spot on be iterative between different strategies, comparing the length of the sets for different strategies - and for this evening of fun I chose to give one strategy which is I find defendable to be close to or equal the minimal set.
So this strategy is to observe the multi dimensional array as a sequence of lists that has a distinct value set each. Then if reducing the total amount of lists with the smallest in the remainder iteratively, weeding out any non used values in that smallest list when having reduced total set in each iteration we will get a path which is close enough to the ideal to be effective as it completes in milliseconds with this approach.
A critique of this approach up front is then that the direction you pass your minimal list in really would have to get iteratively varied to pick best, left to right, right to left, in position sequences X,Y,Z, ... because the amount of potential reducing is not equal. So to get close to the ideal iterations of sequences would have to be made for each iteration too until all combinations were covered, choosing the most reducing sequence. right - but I chose left to right, only!
Now I chose not to run compare execution against Your code, because of the way you instantiate your MatrixResults is an array of int arrays and not instantiated as a multidimension array, which your drawing is, so I went by Your drawing and then couldn't share data source with your code. No matter, you can make that conversion if you wish, onwards to generate sample data:
private int[,] CreateSampleArray(int xDimension, int yDimensions, Random rnd)
{
Debug.WriteLine($"Created sample array of dimensions ({xDimension}, {yDimensions})");
var array = new int[xDimension, yDimensions];
for (int x = 0; x < array.GetLength(0); x++)
{
for(int y = 0; y < array.GetLength(1); y++)
{
array[x, y] = rnd.Next(0, 4000);
}
}
return array;
}
The overall structure with some logging, I'm using xUnit to run the code in
[Fact]
public void SetCoverExperimentTest()
{
var rnd = new Random((int)DateTime.Now.Ticks);
var sw = Stopwatch.StartNew();
int[,] matrixResults = CreateSampleArray(rnd.Next(100, 500), rnd.Next(100, 500), rnd);
//So first requirement is that you must have one element per row, so lets get our unique rows
var listOfAll = new List<List<int>>();
List<int> listOfRow;
for (int y = 0; y < matrixResults.GetLength(1); y++)
{
listOfRow = new List<int>();
for (int x = 0; x < matrixResults.GetLength(0); x++)
{
listOfRow.Add(matrixResults[x, y]);
}
listOfAll.Add(listOfRow.Distinct().ToList());
}
var setFound = new HashSet<int>();
List<List<int>> allUniquelyRequired = GetDistinctSmallestList(listOfAll, setFound);
// This set now has all rows that are either distinctly different
// Or have a reordering of distinct values of that length value lists
// our HashSet has the unique value range
//Meaning any combination of sets with those values,
//grabbing any one for each set, prefering already chosen ones should give a covering total set
var leastSet = new LeastSetData
{
LeastSet = setFound,
MatrixResults = matrixResults,
};
List<Coordinate>? minSet = leastSet.GenerateResultsSet();
sw.Stop();
Debug.WriteLine($"Completed in {sw.Elapsed.TotalMilliseconds:0.00} ms");
Assert.NotNull(minSet);
//There is one for each row
Assert.False(minSet.Select(s => s.y).Distinct().Count() < minSet.Count());
//We took less than 25 milliseconds
var timespan = new TimeSpan(0, 0, 0, 0, 25);
Assert.True(sw.Elapsed < timespan);
//Outputting to debugger for the fun of it
var sb = new StringBuilder();
foreach (var coordinate in minSet)
{
sb.Append($"({coordinate.x}, {coordinate.y}) {matrixResults[coordinate.x, coordinate.y]},");
}
var debugLine = sb.ToString();
debugLine = debugLine.Substring(0, debugLine.Length - 1);
Debug.WriteLine("Resulting set: " + debugLine);
}
Now the more meaty iterative bits
private List<List<int>> GetDistinctSmallestList(List<List<int>> listOfAll, HashSet<int> setFound)
{
// Our smallest set must be a subset the distinct sum of all our smallest lists for value range,
// plus unknown
var listOfShortest = new List<List<int>>();
int shortest = int.MaxValue;
foreach (var list in listOfAll)
{
if (list.Count < shortest)
{
listOfShortest.Clear();
shortest = list.Count;
listOfShortest.Add(list);
}
else if (list.Count == shortest)
{
if (listOfShortest.Contains(list))
continue;
listOfShortest.Add(list);
}
}
var setFoundAddition = new HashSet<int>(setFound);
foreach (var list in listOfShortest)
{
foreach (var item in list)
{
if (setFound.Contains(item))
continue;
if (setFoundAddition.Contains(item))
continue;
setFoundAddition.Add(item);
}
}
//Now we can remove all rows with those found, we'll add the smallest later
var listOfAllRemainder = new List<List<int>>();
bool foundInList;
List<int> consumedWhenReducing = new List<int>();
foreach (var list in listOfAll)
{
foundInList = false;
foreach (int item in list)
{
if (setFound.Contains(item))
{
//Covered by data from last iteration(s)
foundInList = true;
break;
}
else if (setFoundAddition.Contains(item))
{
consumedWhenReducing.Add(item);
foundInList = true;
break;
}
}
if (!foundInList)
{
listOfAllRemainder.Add(list); //adding what lists did not have elements found
}
}
//Remove any from these smallestset lists that did not get consumed in the favour used pass before
if (consumedWhenReducing.Count == 0)
{
throw new Exception($"Shouldn't be possible to remove the row itself without using one of its values, please investigate");
}
var removeArray = setFoundAddition.Where(a => !consumedWhenReducing.Contains(a)).ToArray();
setFoundAddition.RemoveWhere(x => removeArray.Contains(x));
foreach (var value in setFoundAddition)
{
setFound.Add(value);
}
if (listOfAllRemainder.Count != 0)
{
//Do the whole thing again until there in no list left
listOfShortest.AddRange(GetDistinctSmallestList(listOfAllRemainder, setFound));
}
return listOfShortest; //Here we will ultimately have the sum of shortest lists per iteration
}
To conclude: I hope to have inspired You, at least I had fun coming up with a best approximate, and should you feel like completing the code, You're very welcome to grab what You like.
Obviously we should really track the sequence we go through the shortest lists, after all it is of significance if we start by reducing the total distinct lists by element at position 0 or 0+N and which one we reduce with after. I mean we must have one of those values but each time consuming each value has removed most of the total list all it really produces is a value range and the range consumption sequence matters to the later iterations - Because a position we didn't reach before there were no others left e.g. could have remove potentially more than some which were covered. You get the picture I'm sure.
And this is just one strategy, One may as well have chosen the largest distinct list even within the same framework and if You do not iteratively cover enough strategies, there is only brute force left.
Anyways you'd want an AI to act. Just like a human, not to contemplate the existence of universe before, after all we can reconsider pretty often with silicon brains as long as we can do so fast.
With any moving object at least, I'd much rather be 90% on target correcting every second while taking 14 ms to get there, than spend 2 seconds reaching 99% or the illusive 100% => meaning we should stop the vehicle before the concrete pillar or the pram or conversely buy the equity when it is a good time to do so, not figuring out that we should have stopped, when we are allready on the other side of the obstacle or that we should've bought 5 seconds ago, but by then the spot price already jumped again...
Thus the defense rests on the notion that it is opinionated if this solution is good enough or simply incomplete at best :D
I realize it's pretty random, but just to say that although this sketch is not entirely indisputably correct, it is easy to read and maintain and anyways the question is wrong B-] We will very rarely need the absolute minimal set and when we do the answer will be much longer :D
... woopsie, forgot the support classes
public struct Coordinate
{
public int x;
public int y;
public override string ToString()
{
return $"({x},{y})";
}
}
public struct CoordinateValue
{
public int Value { get; set; }
public Coordinate Coordinate { get; set; }
public override string ToString()
{
return string.Concat(Coordinate.ToString(), " ", Value.ToString());
}
}
public class LeastSetData
{
public HashSet<int> LeastSet { get; set; }
public int[,] MatrixResults { get; set; }
public List<Coordinate> GenerateResultsSet()
{
HashSet<int> chosenValueRange = new HashSet<int>();
var chosenSet = new List<Coordinate>();
for (int y = 0; y < MatrixResults.GetLength(1); y++)
{
var candidates = new List<CoordinateValue>();
for (int x = 0; x < MatrixResults.GetLength(0); x++)
{
if (LeastSet.Contains(MatrixResults[x, y]))
{
candidates.Add(new CoordinateValue
{
Value = MatrixResults[x, y],
Coordinate = new Coordinate { x = x, y = y }
}
);
continue;
}
}
if (candidates.Count == 0)
throw new Exception($"OMG Something's wrong! (this row did not have any of derived range [y: {y}])");
var done = false;
foreach (var c in candidates)
{
if (chosenValueRange.Contains(c.Value))
{
chosenSet.Add(c.Coordinate);
done = true;
break;
}
}
if (!done)
{
var firstCandidate = candidates.First();
chosenSet.Add(firstCandidate.Coordinate);
chosenValueRange.Add(firstCandidate.Value);
}
}
return chosenSet;
}
}
This problem is NP hard.
To show that, we have to take a known NP hard problem, and reduce it to this one. Let's do that with the Set Cover Problem.
We start with a universe U of things, and a collection S of sets that covers the universe. Assign each thing a row, and each set a number. This will fill different numbers of columns for each row. Fill in a rectangle by adding new numbers.
Now solve your problem.
For each new number in your solution that didn't come from a set in the original problem, we can replace it with another number in the same row that did come from a set.
And now we turn numbers back into sets and we have a solution to the Set Cover Problem.
The transformations from set cover to your problem and back again are both O(number_of_elements * number_of_sets) which is polynomial in the input. And therefore your problem is NP hard.
Conversely if you replace each number in the matrix with the set of rows covered, your problem turns into the Set Cover Problem. Using any existing solver for set cover then gives a reasonable approach for your problem as well.
The code is not particularly tidy or optimised, but illustrates the approach I think #btilly is suggesting in his answer (E&OE) using a bit of recursion (I was going for intuitive rather than ideal for scaling, so you may have to work an iterative equivalent).
From the rows with their values make a "values with the rows that they appear in" counterpart. Now pick a value, eliminate all rows in which it appears and solve again for the reduced set of rows. Repeat recursively, keeping only the shortest solutions.
I know this is not terribly readable (or well explained) and may come back to tidy up in the morning, so let me know if it does what you want (is worth a bit more of my time;-).
// Setup
var rowValues = new Dictionary<int, HashSet<int>>
{
[0] = new() { 0, 2, 3, 4, 5 },
[1] = new() { 1, 2, 4, 5, 6 },
[2] = new() { 1, 3, 4, 5, 6 },
[3] = new() { 2, 3, 4, 5, 6 },
[4] = new() { 1, 2, 3, 4, 5 }
};
Dictionary<int, HashSet<int>> ValueRows(Dictionary<int, HashSet<int>> rv)
{
var vr = new Dictionary<int, HashSet<int>>();
foreach (var row in rv.Keys)
{
foreach (var value in rv[row])
{
if (vr.ContainsKey(value))
{
if (!vr[value].Contains(row))
vr[value].Add(row);
}
else
{
vr.Add(value, new HashSet<int> { row });
}
}
}
return vr;
}
List<int> FindSolution(Dictionary<int, HashSet<int>> rAndV)
{
if (rAndV.Count == 0) return new List<int>();
var bestSolutionSoFar = new List<int>();
var vAndR = ValueRows(rAndV);
foreach (var v in vAndR.Keys)
{
var copyRemove = new Dictionary<int, HashSet<int>>(rAndV);
foreach (var r in vAndR[v])
copyRemove.Remove(r);
var solution = new List<int>{ v };
solution.AddRange(FindSolution(copyRemove));
if (bestSolutionSoFar.Count == 0 || solution.Count > 0 && solution.Count < bestSolutionSoFar.Count)
bestSolutionSoFar = solution;
}
return bestSolutionSoFar;
}
var solution = FindSolution(rowValues);
Console.WriteLine($"Optimal solution has values {{ {string.Join(',', solution)} }}");
output Optimal solution has values { 4 }

Getting N x N dimension data from quad tree is very slow in c#

I am using quad-tree structure for my data processing application in c#, it is similar to hashlife algorithm. Getting data N x N (eg. 2000 x 2000) dimension data from quad-tree is very very slow.
how can i optimize it for extracting large data from quad tree.
Edit:
Here is the code i used to extract the data in recursive manner
public int Getvalue(long x, long y)
{
if (level == 0)
{
return value;
}
long offset = 1 << (level - 2);
if (x < 0)
{
if (y < 0)
{
return NW.Getvalue(x + offset, y + offset);
}
else
{
return SW.Getvalue(x + offset, y - offset);
}
}
else
{
if (y < 0)
{
return NE.Getvalue(x - offset, y + offset);
}
else
{
return SE.Getvalue(x - offset, y - offset);
}
}
}
outer code
int limit = 500;
List<int> ExData = new List<int>();
for (int row = -limit; row < limit; row++)
{
for (int col = -limit; col < limit; col++)
{
ExData.Add(Root.Getvalue(row, col));
//sometimes two dimension array
}
}
A quadtree or any other structure isn't going to help if you're going to visit every element (i.e. level 0 leaf node). Whatever code gets the value in a given cell, an exhaustive tour will visit 4,000,000 points. Your way does arithmetic over and over again as it goes down the tree at each visit.
So for element (-limit,-limit) the code visits every tier and then returns. For the next element it visits every tier and then returns and so on. That is very labourious.
It will speed up if you make the process of adding to the list itself recursively visiting each quadrant once.
NB: I'm not a C# programmer so please correct any errors here:
public void AppendValues(List<int> ExData) {
if(level==0){
ExData.Add(value);
} else{
NW.AppendValues(ExData);
NE.AppendValues(ExData);
SW.AppendValues(ExData);
SE.AppendValues(ExData);
}
}
That will append all the values though not in the raster-scan (row-by-row) order of the original code!
A further speed up can be achieved if you are dealing with sparse data. So if in many cases nodes are empty or even 'solid' (all zero or one value) you could set the nodes to null and then use zero or the solid value.
That trick works well in Hashlife for Conway Life but depends on your application. Interesting patterns have large areas of 'dead' cells that will always propagate to dead and rarely need considering in detail.
I'm not sure what 25-40% means as 'duplicates'. If they aren't some fixed value or are scattered across the tree large 'solid' regions are likely to be rare and that trick may not help here.
Also, if you actually need to only get the values in some region (e.g. rectangle) you need to be a bit cleverer about how you work out which sub-region of each quadrant you need using offset but it will still be far more efficient than 'brute' force tour of every element. Make sure the code realises when the region of interest is entirely outside the node in hand and return quickly.
All this said if creating a list of all the values in the quad-tree is a common activity in your application, a quad-tree may not be the answer you need. A map simply mapping (row,col) to value is pre-made and again very efficient if there is some common default value (e.g. zero).
It may help to create an iterator object rather than add millions of items to a list; particularly if the list is transient and destroyed soon after.
More information about the actual application is required to understand if a quadtree is the answer here. The information provided so far suggests it isn't.

How to get an item from a list by its property, and then use it's other property?

class Node
{
int number;
Vector2 position;
public Node(int number, Vector2 position)
{
this.number = number;
this.position = position;
}
}
List<Node>nodes = new List<Node>();
for (int i = 0; i < nodes.Count; i++) //basically a foreach
{
// Here i would like to find each node from the list, in the order of their numbers,
// and check their vectors
}
So, as the code pretty much tells, i am wondering how i can
find a specific node from the list, specifically one with the attribute "numbers" being i (Eg going through all of them in the order of their "number" attribute).
check its other attribute
Have tried:
nodes.Find(Node => Node.number == i);
Node test = nodes[i];
place = test.position
they cant apparently access node.number / node.position due to its protection level.
Also the second one has the problem that the nodes have to be sorted first.
Also looked at this question
but [] solution is in the "Tried" caterology, foreach solution doesn't seem to work for custom classes.
I'm a coding newbie (Like 60 hours), so don't
Explain in a insanely hard way.
Say i am dumb for not knowing a this basic thing.
Thanks!
I would add properties for Number and Position, making them available to outside users (currently their access modifier is private):
class Node
{
public Node(int number, Vector2 position)
{
this.Number = number;
this.Position = position;
}
public int Number { get; private set; }
public Vector2 Position { get; private set; }
}
Now your original attempt should work:
nodes.Find(node => node.Number == i);
However, it sounds like sorting the List<Node> and then accessing by index would be faster. You would be sorting the list once and directly indexing the list vs looking through the list on each iteration for the item you want.
List<Node> SortNodes(List<Node> nodes)
{
List<Node> sortedNodes = new List<Node>();
int length = nodes.Count; // The length gets less and less every time, but we still need all the numbers!
int a = 0; // The current number we are looking for
while (a < length)
{
for (int i = 0; i < nodes.Count; i++)
{
// if the node's number is the number we are looking for,
if (nodes[i].number == a)
{
sortedNodes.Add(list[i]); // add it to the list
nodes.RemoveAt(i); // and remove it so we don't have to search it again.
a++; // go to the next number
break; // break the loop
}
}
}
return sortedNodes;
}
This is a simple sort function. You need to make the number property public first.
It will return a list full of nodes in the order you want to.
Also: The searching goes faster as more nodes are added to the sorted nodes list.
Make sure you that all nodes have a different number! Otherwise it would get stuck in an infinite loop!

What is wrong with my SURF/KMeans classifer

I am attempting to create a classifier/predictor using SURF and a Naive Bayesian. I am pretty much following the technique from "Visual Categorization with Bags of Keypoints" by Dance, Csurka... I am using SURF instead of SIFT.
My results are pretty horrendous and I am not sure where my error lies. I am using 20 car samples (ham) and 20 motorcycle samples(spam) from the CalTec set. I suspect it is in the way I am creating my vocabulary. What I can see is that the EMGU/OpenCV kmeans2 classifier is returning different results given the same SURF descriptor input. That makes me suspicious. Here is my code so far.
public Matrix<float> Extract<TColor, TDepth>(Image<TColor, TDepth> image)
where TColor : struct, Emgu.CV.IColor
where TDepth : new()
{
ImageFeature[] modelDescriptors;
using (var imgGray = image.Convert<Gray, byte>())
{
var modelKeyPoints = surfCPU.DetectKeyPoints(imgGray, null);
//the surf descriptor is a size 64 vector describing the intensity pattern surrounding
//the corresponding modelKeyPoint
modelDescriptors = surfCPU.ComputeDescriptors(imgGray, null, modelKeyPoints);
}
var samples = new Matrix<float>(modelDescriptors.Length, DESCRIPTOR_COUNT);//SURF Descriptors have 64 samples
for (int k = 0; k < modelDescriptors.Length; k++)
{
for (int i = 0; i < modelDescriptors[k].Descriptor.Length; i++)
{
samples.Data[k, i] = modelDescriptors[k].Descriptor[i];
}
}
//group descriptors into clusters using K-means to form the feature vectors
//create "vocabulary" based on square-error partitioning K-means
var centers = new Matrix<float>(CLUSTER_COUNT, samples.Cols, 1);
var term = new MCvTermCriteria();
var labelVector = new Matrix<int>(modelDescriptors.Length, 1);
var cluster = CvInvoke.cvKMeans2(samples, CLUSTER_COUNT, labelVector, term, 3, IntPtr.Zero, 0, centers, IntPtr.Zero);
//this is the quantized feature vector as described in Dance, Csurska Bag of Keypoints (2004)
var keyPoints = new Matrix<float>(1, CLUSTER_COUNT);
//quantize the vector into a feature vector
//making a histogram of the result counts
for (int i = 0; i < labelVector.Rows; i++)
{
var value = labelVector.Data[i, 0];
keyPoints.Data[0, value]++;
}
//normalize the histogram since it will have different amounts of points
keyPoints = keyPoints / keyPoints.Norm;
return keyPoints;
}
The output gets fed into NormalBayesClassifier. This is how I train it.
Parallel.For(0, hamCount, i =>
{
using (var img = new Image<Gray, byte>(_hams[i].FullName))
{
var features = _extractor.Extract(img);
features.CopyTo(trainingData.GetRow(i));
trainingClass.Data[i, 0] = 1;
}
});
Parallel.For(0, spamCount, j =>
{
using (var img = new Image<Gray, byte>(_spams[j].FullName))
{
var features = img.ClassifyFeatures(_extractor);
features.CopyTo(trainingData.GetRow(j));
trainingClass.Data[j + hamCount, 0] = 0;
}
});
using (var classifier = new NormalBayesClassifier())
{
if (classifier.Train(trainingData, trainingClass, null, null, false))
{
classifier.Save(_statModelFilePath);
}
}
When I call Predict using the NormalBayesClassifier it returns 1(match) for all of the training samples...ham and spam.
Any help would be greatly appreciated.
Edit.
One other note is that I have chosen CLUSTER_COUNT from 5 to 500 all with the same result.
The problem was more conceptual than technical. I did not understand that the K Means cluster was building the vocabulary for the "entire" data set. The way to do it correctly is to give the CvInvoke.cvKMeans2 call a training matrix containing all of the features for every image. I was building the vocabulary each time based on a single image.
My final solution involved pulling the SURF code into its own method and running that on each ham and spam image. I then used the massive result set to build a training matrix and gave that to the CvInvoke.cvKMeans2 method. It took quite a long time to finish the training. I have about 3000 images total.
My results were better. The prediction rate was 100% accurate with the training data. My problem now is that I am likely suffering from over fitting because its prediction rate is still poor for non-training data. I will play around with the hessian threshold in the SURF algorithm as well as the cluster count to see if I can minimize the over fitting.

Dijkstra Shortest-Paths fast recalculation if only an edge got removed

I'm trying to calculate the shortest paths. This does work with the below pasted implementation of Dijkstra. However I want to speed the things up.
I use this implementation to decide to which field I want to go next. The graph represents an two dimensional array where all fields are connected to each neighbours. But over time the following happens: I need to remove some edges (there are obstacles). The start node is my current position which does also change over time.
This means:
I do never add a node, never add a new edge, never change the weight of an edge. The only operation is removing an edge
The start node does change over time
Questions:
Is there an algorithm wich can do a fast recalculation of the shortest-paths when I know that the only change in the graph is the removal of an edge?
Is there an algorithm wich allows me to fast recalculate the shortest path when the start node changes only to one of it's neighbours?
Is another algorithm maybe better suited for my problem?
Thx for your help
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
public class Dijkstra<T>
{
private Node<T> calculatedStart;
private ReadOnlyCollection<Node<T>> Nodes {
get ;
set ;
}
private ReadOnlyCollection<Edge<T>> Edges {
get;
set;
}
private List<Node<T>> NodesToInspect {
get;
set ;
}
private Dictionary<Node<T>, int> Distance {
get ;
set ;
}
private Dictionary<Node<T>, Node<T>> PreviousNode {
get;
set ;
}
public Dijkstra (ReadOnlyCollection<Edge<T>> edges, ReadOnlyCollection<Node<T>> nodes)
{
Edges = edges;
Nodes = nodes;
NodesToInspect = new List<Node<T>> ();
Distance = new Dictionary<Node<T>, int> ();
PreviousNode = new Dictionary<Node<T>, Node<T>> ();
foreach (Node<T> n in Nodes) {
PreviousNode.Add (n, null);
NodesToInspect.Add (n);
Distance.Add (n, int.MaxValue);
}
}
public LinkedList<T> GetPath (T start, T destination)
{
Node<T> startNode = new Node<T> (start);
Node<T> destinationNode = new Node<T> (destination);
CalculateAllShortestDistances (startNode);
// building path going back from the destination to the start always taking the nearest node
LinkedList<T> path = new LinkedList<T> ();
path.AddFirst (destinationNode.Value);
while (PreviousNode[destinationNode] != null) {
destinationNode = PreviousNode [destinationNode];
path.AddFirst (destinationNode.Value);
}
path.RemoveFirst ();
return path;
}
private void CalculateAllShortestDistances (Node<T> startNode)
{
if (startNode.Value.Equals (calculatedStart)) {
return;
}
Distance [startNode] = 0;
while (NodesToInspect.Count > 0) {
Node<T> nearestNode = GetNodeWithSmallestDistance ();
// if we cannot find another node with the function above we can exit the algorithm and clear the
// nodes to inspect because they would not be reachable from the start or will not be able to shorten the paths...
// this algorithm does also implicitly kind of calculate the minimum spanning tree...
if (nearestNode == null) {
NodesToInspect.Clear ();
} else {
foreach (Node<T> neighbour in GetNeighborsFromNodesToInspect(nearestNode)) {
// calculate distance with the currently inspected neighbour
int dist = Distance [nearestNode] + GetDirectDistanceBetween (nearestNode, neighbour);
// set the neighbour as shortest if it is better than the current shortest distance
if (dist < Distance [neighbour]) {
Distance [neighbour] = dist;
PreviousNode [neighbour] = nearestNode;
}
}
NodesToInspect.Remove (nearestNode);
}
}
calculatedStart = startNode;
}
private Node<T> GetNodeWithSmallestDistance ()
{
int distance = int.MaxValue;
Node<T> smallest = null;
foreach (Node<T> inspectedNode in NodesToInspect) {
if (Distance [inspectedNode] < distance) {
distance = Distance [inspectedNode];
smallest = inspectedNode;
}
}
return smallest;
}
private List<Node<T>> GetNeighborsFromNodesToInspect (Node<T> n)
{
List<Node<T>> neighbors = new List<Node<T>> ();
foreach (Edge<T> e in Edges) {
if (e.Start.Equals (n) && NodesToInspect.Contains (n)) {
neighbors.Add (e.End);
}
}
return neighbors;
}
private int GetDirectDistanceBetween (Node<T> startNode, Node<T> endNode)
{
foreach (Edge<T> e in Edges) {
if (e.Start.Equals (startNode) && e.End.Equals (endNode)) {
return e.Distance;
}
}
return int.MaxValue;
}
}
Is there an algorithm wich can do a fast recalculation of the shortest-paths when I know that the only change in the graph is the removal of an edge?
Is there an algorithm wich allows me to fast recalculate the shortest path when the start node changes only to one of it's neighbours?
The answer to both of these questions is yes.
For the first case, the algorithm you're looking for is called LPA* (sometimes, less commonly, called Incremental A*. The title on that paper is outdated). It's a (rather complicated) modification to A* that allows fast recalculation of best paths when only a few edges have changed.
Like A*, LPA* requires an admissible distance heuristic. If no such heuristic exists, you can just set it to 0. Doing this in A* will essentially turn it into Djikstra's algorithm; doing this in LPA* will turn it into an obscure, rarely-used algorithm called DynamicSWSF-SP.
For the second case, you're looking for D*-Lite. It is a pretty simple modification to LPA* (simple, at least, once you understand LPA*) that does incremental pathfinding as the unit moves from start-to-finish and new information is gained (edges are added/removed/changed). It is primarily used for robots traversing an unknown or partially-known terrain.
I've written up a fairly comprehensive answer (with links to papers, in the question) on various pathfinding algorithms here.

Categories

Resources