K-means with initial centers

K-means with initial centers - c#

I'm doing K-means clustering of n points and k centers.
To start off, here's my code so far:
A class for a point:
public class Point
{
public int Id { get; set; }
public double X { get; set; }
public double Y { get; set; }
public Point()
{
Id = -1;
X = -1;
Y = -1;
}
public Point(int id, double x, double y)
{
this.Id = id;
this.X = x;
this.Y = y;
}
public static double FindDistance(Point pt1, Point pt2)
{
double x1 = pt1.X, y1 = pt1.Y;
double x2 = pt2.X, y2 = pt2.Y;
double distance = Math.Sqrt(Math.Pow(x2 - x1, 2.0) + Math.Pow(y2 - y1, 2.0));
return (distance);
}
}
A class for data:
public class PointCollection : List<Point>
{
public Point Centroid { get; set; }
public PointCollection()
: base()
{
Centroid = new Point();
}
public void AddPoint(Point p)
{
this.Add(p);
UpdateCentroid();
}
public Point RemovePoint(Point p)
{
Point removedPoint = new Point(p.Id, p.X, p.Y);
this.Remove(p);
UpdateCentroid();
return (removedPoint);
}
public void UpdateCentroid()
{
double xSum = (from p in this select p.X).Sum();
double ySum = (from p in this select p.Y).Sum();
Centroid.X = (xSum / (double)this.Count);
Centroid.Y = (ySum / (double)this.Count);
}
}
Main class:
public class KMeans
{
public static List<PointCollection> DoKMeans(PointCollection points, int clusterCount)
{
List<PointCollection> allClusters = new List<PointCollection>();
List<List<Point>> allGroups = ListUtility.SplitList<Point>(points, clusterCount);
foreach (List<Point> group in allGroups)
{
PointCollection cluster = new PointCollection();
cluster.AddRange(group);
allClusters.Add(cluster);
}
int movements = 1;
while (movements > 0)
{
movements = 0;
foreach (PointCollection cluster in allClusters)
{
for (int pointIndex = 0; pointIndex < cluster.Count; pointIndex++)
{
Point point = cluster[pointIndex];
int nearestCluster = FindNearestCluster(allClusters, point);
if (nearestCluster != allClusters.IndexOf(cluster))
{
if (cluster.Count > 1)
{
Point removedPoint = cluster.RemovePoint(point);
allClusters[nearestCluster].AddPoint(removedPoint);
movements += 1;
}
}
}
}
}
return (allClusters);
}
public static int FindNearestCluster(List<PointCollection> allClusters, Point point)
{
double minimumDistance = 0.0;
int nearestClusterIndex = -1;
for (int k = 0; k < allClusters.Count; k++)
{
double distance = Point.FindDistance(point, allClusters[k].Centroid);
if (k == 0)
{
minimumDistance = distance;
nearestClusterIndex = 0;
}
else if (minimumDistance > distance)
{
minimumDistance = distance;
nearestClusterIndex = k;
}
}
return (nearestClusterIndex);
}
}
And finally helping function for list split:
public static List<List<T>> SplitList<T>(List<T> items, int groupCount)
{
List<List<T>> allGroups = new List<List<T>>();
int startIndex = 0;
int groupLength = (int)Math.Round((double)items.Count / (double)groupCount, 0);
while (startIndex < items.Count)
{
List<T> group = new List<T>();
group.AddRange(items.GetRange(startIndex, groupLength));
startIndex += groupLength;
if (startIndex + groupLength > items.Count)
{
groupLength = items.Count - startIndex;
}
allGroups.Add(group);
}
if (allGroups.Count > groupCount && allGroups.Count > 2)
{
allGroups[allGroups.Count - 2].AddRange(allGroups.Last());
allGroups.RemoveAt(allGroups.Count - 1);
}
return (allGroups);
}
So, now I'm trying to write a second method for the main class, which would accept a predefined starting centres. I have trouble grasping that though, as I can't find anything on the internet where the k-means algorithm would use initial centers. Can someone point me in a direction of such guide or give me any ideas how to modify the code? Thanks.
Edit: Maybe something more why am I trying to do this: I try to write LBG algorithm using k-means, like so https://onlinecourses.science.psu.edu/stat557/node/67
I can access the computed centers for splitting in each of the steps with my code, however I need to find a way to feed them back to the k-means class. Like, if I calculated the starting centre, i need to put this centre and another offset by epsilon into k-means algorithm.
Edit2: Code now in English(I hope)

Found a solution, maybe someone will use:
public static List<PointCollection> DoKMeans(PointCollection points, int clusterCount, Point[] startingCentres)
{
// code...
int ctr = 0;
foreach (List<Point> group in allGroups)
{
PointCollection cluster = new PointCollection();
cluster.c.X = startingCentres[ctr].X;
cluster.c.Y = startingCentres[ctr].Y;
cluster.AddRange(group);
allClusters.Add(cluster);
}
// rest of code the same
}

Related

Multi-threading with TPL - Access Internal Class Properties

I am using the TPL library to parallelize a 2D grid operation. I have extracted a simple example from my actual code to illustrate what I am doing. I am getting the desired results I want and my computation times are sped up by the number of processors on my laptop (12).
I would love to get some advice or opinions on my code as far as how my properties are declared. Again, it works as expected, but wonder if the design could be better. Thank you in advance.
My simplified code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace Gridding
{
public abstract class Base
{
/// <summary>
/// Values representing the mesh values where each node in the gridis assigned som value.
/// </summary>
public float[,] Values { get; set; }
public abstract void Compute();
}
public class Derived : Base
{
/// <summary>
/// Make the mesh readonly. Is this necessary?
/// </summary>
readonly Mesh MyMesh;
Derived(Mesh mesh)
{
MyMesh = mesh;
}
public override void Compute()
{
Values = new float[MyMesh.NX, MyMesh.NY];
double[] xn = MyMesh.GetXNodes();
double[] yn = MyMesh.GetYNodes();
/// Paralellize the threads along the columns of the grid using the Task Paralllel Library (TPL).
Parallel.For(0, MyMesh.NX, i =>
{
Run(i, xn, yn);
});
}
private void Run(int i, double[] xn, double[] yn)
{
/// Some long operation that parallelizes along the columns of a mesh/grid
double x = xn[i];
for (int j = 0; j < MyMesh.NY; j++)
{
/// Again, longer operation here
double y = yn[j];
double someValue = Math.Sqrt(x * y);
Values[i, j] = (float)someValue;
}
}
static void Main(string[] args)
{
int nx = 100;
int ny = 120;
double x0 = 0.0;
double y0 = 0.0;
double width = 100;
double height = 120;
Mesh mesh = new Mesh(nx, ny, x0, y0, width, height);
Base tplTest = new Derived(mesh);
tplTest.Compute();
float[,] values = tplTest.Values;
Console.Read();
}
/// <summary>
/// A simple North-South oriented grid.
/// </summary>
class Mesh
{
public int NX { get; } = 100;
public int NY { get; set; } = 150;
public double XOrigin { get; set; } = 0.0;
public double YOrigin { get; set; } = 0.0;
public double Width { get; set; } = 100.0;
public double Height { get; set; } = 150.0;
public double DX { get; }
public double DY { get; }
public Mesh(int nx, int ny, double xOrigin, double yOrigin, double width, double height)
{
NX = nx;
NY = ny;
XOrigin = xOrigin;
YOrigin = yOrigin;
Width = width;
Height = height;
DX = Width / (NX - 1);
DY = Height / (NY - 1);
}
public double[] GetYNodes()
{
double[] yNodeLocs = new double[NY];
for (int i = 0; i < NY; i++)
{
yNodeLocs[i] = YOrigin + i * DY;
}
return yNodeLocs;
}
public double[] GetXNodes()
{
double[] xNodeLocs = new double[NX];
for (int i = 0; i < NX; i++)
{
xNodeLocs[i] = XOrigin + i * DX;
}
return xNodeLocs;
}
}
}
}

With only 100 to 150 array elements, parallelism setup overhead would offset the gains. You could cache the result of GetXNodes() and GetYNodes() and invalidate the cached values when XOrigin, NX, YOrigin or NY are modified.

A* pathfinding algorithm - got stuck calculating shortest path

I'm trying to implement A* algorithm in order to find the shortest path in given grid.
My Node class:
public class Node : IComparable
{
public Node(int row, int col, Node previousNode = null, double distance = double.PositiveInfinity)
{
this.Row = row;
this.Col = col;
this.PreviousNode = previousNode;
this.Distance = distance;
}
public int Row { get; }
public int Col { get; }
public bool IsVisited { get; internal set; }
public double Distance { get; set; }
public int Weight { get; set; } = 1;
public double GScore { get; set; } = double.PositiveInfinity;
public double H { get; set; }
public double FScore => this.GScore + this.H;
public NodeType? NodeType { get; internal set; }
public Node PreviousNode { get; set; }
public override bool Equals(object obj)
{
var otherNode = obj as Node;
return this.Equals(otherNode);
}
protected bool Equals(Node other)
=> this.Row == other.Row && this.Col == other.Col;
public override int GetHashCode()
{
unchecked
{
return (this.Row * 397) ^ this.Col;
}
}
public int CompareTo(object obj)
{
var otherNode = obj as Node;
if (this.FScore == otherNode.FScore)
{
if (this.H >= otherNode.H)
{
return 1;
}
else if (this.H < otherNode.H)
{
return -1;
}
}
return this.FScore.CompareTo(otherNode.FScore);
}
}
A* algo class:
public override Result Execute(Node[,] grid, Node startNode, Node endNode)
{
var heap = new MinHeap<Node>();
var allSteps = new HashSet<Node>();
startNode.GScore = 0;
startNode.H = ManhattanDistance(startNode, endNode);
startNode.IsVisited = true;
heap.Add(startNode);
while (heap.Count != 0)
{
var currentNode = heap.Pop();
if (currentNode.NodeType == NodeType.Wall)
continue;
allSteps.Add(currentNode);
if (currentNode.Equals(endNode))
{
return new Result(allSteps, this.GetAllNodesInShortestPathOrder(currentNode));
}
var rowDirection = new[] { -1, +1, 0, 0 };
var columnDirection = new[] { 0, 0, +1, -1 };
for (int i = 0; i < 4; i++)
{
var currentRowDirection = currentNode.Row + rowDirection[i];
var currentColDirection = currentNode.Col + columnDirection[i];
if ((currentRowDirection < 0 || currentColDirection < 0)
|| (currentRowDirection >= grid.GetLength(0)
|| currentColDirection >= grid.GetLength(1)))
{
continue;
}
var nextNode = grid[currentRowDirection, currentColDirection];
AddNodeToHeap(currentNode, nextNode, endNode, heap);
}
}
return new Result(allSteps);
}
private void AddNodeToHeap(Node currentNode, Node nextNode, Node endNode, MinHeap<Node> heap)
{
if (nextNode.IsVisited || nextNode.GScore < currentNode.GScore)
return;
var g = currentNode.GScore + nextNode.Weight;
var h = ManhattanDistance(nextNode, endNode);
if (g + h < nextNode.FScore)
{
nextNode.GScore = g;
nextNode.H = h;
nextNode.PreviousNode = currentNode;
nextNode.IsVisited = true;
}
heap.Add(nextNode);
}
private static int ManhattanDistance(Node currentNode, Node endNode)
{
var dx = Math.Abs(currentNode.Row - endNode.Row);
var dy = Math.Abs(currentNode.Col - endNode.Col);
return dx + dy;
}
Custom MinHeap class:
public class MinHeap<T>
{
private readonly IComparer<T> comparer;
private readonly List<T> list = new List<T> { default };
public MinHeap()
: this(default(IComparer<T>))
{
}
public MinHeap(IComparer<T> comparer)
{
this.comparer = comparer ?? Comparer<T>.Default;
}
public MinHeap(Comparison<T> comparison)
: this(Comparer<T>.Create(comparison))
{
}
public int Count => this.list.Count - 1;
public void Add(T element)
{
this.list.Add(element);
this.ShiftUp(this.list.Count - 1);
}
public T Pop()
{
T result = this.list[1];
this.list[1] = this.list[^1];
this.list.RemoveAt(this.list.Count - 1);
this.ShiftDown(1);
return result;
}
private static int Parent(int i) => i / 2;
private static int Left(int i) => i * 2;
private static int Right(int i) => i * 2 + 1;
private void ShiftUp(int i)
{
while (i > 1)
{
int parent = Parent(i);
if (this.comparer.Compare(this.list[i], this.list[parent]) > 0)
{
return;
}
(this.list[parent], this.list[i]) = (this.list[i], this.list[parent]);
i = parent;
}
}
private void ShiftDown(int i)
{
for (int left = Left(i); left < this.list.Count; left = Left(i))
{
int smallest = this.comparer.Compare(this.list[left], this.list[i]) <= 0 ? left : i;
int right = Right(i);
if (right < this.list.Count && this.comparer.Compare(this.list[right], this.list[smallest]) <= 0)
{
smallest = right;
}
if (smallest == i)
{
return;
}
(this.list[i], this.list[smallest]) = (this.list[smallest], this.list[i]);
i = smallest;
}
}
}
The problem is that it doesn't find the optimal path when I have some weights on the map. For example:
Every square on the grid which is marked as a weight node has weight of 10 otherwise it's 1.
Here's example:
Example grid with 3 weight nodes - green node is start node, red node is end node and the dumbbell node is weight node.
When I run the algorithm I get the following result.
It's clearly visible that this is not the shortest path since the algorithm goes through the first node which has weight 1 and then the next node with weight 10 instead of just passing one 10 weight node. The shortest path should've been the red one which I've marked.
P.S I've managed to make it respect the weights by adding new heuristic function when calculating GCost and it now calculates the path but instead of one straight line I get some strange path:
Thank you in advance!

#jdweng
I actually fixed the bug by implementing an additional method which adds additional weight to the GScore
blue squares - all steps which the algorithm took in order to find the shortest final path
A* Algo with weights
A* algo without weights
[
Dijkstra Algo with weights
Dijkstra Algo without weights
private (double weight, NodeDirection? Direction) GetDistanceAndDirection(Node nodeOne, Node nodeTwo)
{
var x1 = nodeOne.Row;
var y1 = nodeOne.Col;
var x2 = nodeTwo.Row;
var y2 = nodeTwo.Col;
if (x2 < x1 && y1 == y2)
{
switch (nodeOne.Direction)
{
case NodeDirection.Up:
return (1, NodeDirection.Up);
case NodeDirection.Right:
return (2, NodeDirection.Up);
case NodeDirection.Left:
return (2, NodeDirection.Up);
case NodeDirection.Down:
return (3, NodeDirection.Up);
}
}
else if (x2 > x1 && y1 == y2)
{
switch (nodeOne.Direction)
{
case NodeDirection.Up:
return (3, NodeDirection.Down);
case NodeDirection.Right:
return (2, NodeDirection.Down);
case NodeDirection.Left:
return (2, NodeDirection.Down);
case NodeDirection.Down:
return (1, NodeDirection.Down);
}
}
if (y2 < y1 && x1 == x2)
{
switch (nodeOne.Direction)
{
case NodeDirection.Up:
return (2, NodeDirection.Left);
case NodeDirection.Right:
return (3, NodeDirection.Left);
case NodeDirection.Left:
return (1, NodeDirection.Left);
case NodeDirection.Down:
return (2, NodeDirection.Left);
}
}
else if (y2 > y1 && x1 == x2)
{
switch (nodeOne.Direction)
{
case NodeDirection.Up:
return (2, NodeDirection.Right);
case NodeDirection.Right:
return (1, NodeDirection.Right);
case NodeDirection.Left:
return (3, NodeDirection.Right);
case NodeDirection.Down:
return (2, NodeDirection.Right);
}
}
return default;
}
and then AddNodeToHeapMethod()
private void AddNodeToHeap(Node currentNode, Node nextNode, Node endNode, MinHeap<Node> heap)
{
if (nextNode.IsVisited)
return;
var (additionalWeight, direction) = this.GetDistanceAndDirection(currentNode, nextNode);
var g = currentNode.GScore+ nextNode.Weight + additionalWeight;
var h = this.ManhattanDistance(nextNode, endNode);
if (g < nextNode.GScore)
{
nextNode.GScore= g;
nextNode.H = h;
nextNode.PreviousNode = currentNode;
nextNode.IsVisited = true;
}
currentNode.Direction = direction;
heap.Add(nextNode);
}

how to to multiply and divide in a static stack?

This is the static array I have been given in making a RPN calculator. From this code the RPN calculator adds and subtracts. Now I need to extend my code to multiply and divide but I cant I don't know how.
public class IntStack
{
private const int maxsize = 10;
private int top = 0;
private int[] array = new int[maxsize];
public void Push(int value)
{
array[top++] = value;
}
public int Pop()
{
return array[--top];
}
public int Peek()
{
return array[top - 1];
}
public bool IsEmpty()
{
return top == 0;
}
public bool IsFull()
{
return top == maxsize;
}
public string Print()
{
StringBuilder output = new StringBuilder();
for (int i = top - 1; i >= 0; i--)
output.Append(array[i] + Environment.NewLine);
return output.ToString();
}
}

Here are some methods you can add to your IntStack class that will perform the multiply and division operations. I've added minimal error checking.
public void Multiply()
{
if (array.Length < 2)
return;
var factor1 = Pop();
var factor2 = Pop();
Push(factor1 * factor2);
}
public void Divide()
{
if (array.Length < 2)
return;
var numerator = Pop();
var divisor = Pop();
if (divisor == 0) { // Return stack back to original state.
Push(divisor);
Push(numerator);
return;
}
Push(numerator / divisor);
}

interface syntax - unclear use of this

I couldn't find any explanation for the line in a red rectangle, could anyone help me breaking it down?

The given interface defines an override for the indexer operator ([]). A simple example of use:
class Point
{
public int X { get; set; }
public int Y { get; set; }
}
class PointCollection
{
public List<Point> collection { get; set; }
public Point this[int x, int y]
{
get => collection.FirstOrDefault(item => item.X == x && item.Y == y);
}
}
And then:
PointCollection points = new PointCollection();
var item = points[100, 200];

Hope this answers to your question
public class MyImage : IImage
{
private IColor[,] imageData = null;
int w=0, h = 0;
public MyImage(int width, int height, IColor defaultColor)
{
//throw new NotImplementedException();
imageData = new IColor[width, height];
for (int i = 0; i < width; i++)
{
for (int j = 0; j < height; j++)
{
imageData[i, j] = defaultColor;
}
}
}
public IColor this[int x, int y]
{
get
{
w = imageData.GetLength(0);
h = imageData.GetLength(1);
//throw new NotImplementedException();
if ((w > 0 && w > x) && (h > 0 && h > y))
return imageData[x, y];
else
return null;
}
set
{
//throw new NotImplementedException();
if ((w > 0 && w > x) && (h > 0 && h > y))
imageData[x, y] = value;
}
}
}
public interface IColor
{
}
public interface IImage
{
IColor this[int x, int y] { get; set; }
}

My rectangle area and perimeter comes out zero in C#

why the result in the area of the rectangle and it's perimeter comes out by zero?
in using C#
namespace ConsoleApplication8
{
class Program
{
static void Main(string[] args)
{
rec r = new rec();
r.L = Convert.ToInt32(Console.ReadLine());
r.wi = Convert.ToInt32(Console.ReadLine());
Console.WriteLine(r.area());
Console.WriteLine(r.per());
}
}
class rec
{
int l;
int w;
public int L
{
set
{
l = L;
}
get
{
return l;
}
}
public int wi
{
set
{
w = wi;
}
get
{
return w;
}
}
public rec()
{
}
public rec(int x, int y)
{
l = x;
w = y;
}
public double area()
{
return l * w;
}
public int per()
{
return 2 * l + 2 * w;
}
}
}

set should be using implicit value parameter. Your code sets properties to they current value instead:
private int width;
public int Width
{
get { return width; }
set { width = value; }
}
Note that since you don't use any logic in get/set you can use auto-implemented properties instead:
public int Width {get;set;} // no need to define private backing field.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

K-means with initial centers - c#

Related

Multi-threading with TPL - Access Internal Class Properties

A* pathfinding algorithm - got stuck calculating shortest path

how to to multiply and divide in a static stack?

interface syntax - unclear use of this

My rectangle area and perimeter comes out zero in C#

Categories

Resources