I have a complex problem, I don't know whether I can describe it properly or not.
I have two dimensional array of objects of a class. Currently my algorithm operates only on this two dimensional array but only some of the locations of that array are occupied. (almost 40%)
It works fine for small data set but if I have large data set (large number of elements of that 2d array e.g. 10000) then the program becomes memory exhaustive. Because I have nested loops that make 10000 * 10000 = 100000000 iterations.
Can I replace the 2 d array with Hashtable or some other data structure? My main aim is to reduce the number of iterations only by changing the data structure.
Pardon me for not explaining properly.
I am developing using C#
Sounds like the data structure you have is a sparse matrix and I'm going to point you to Are there any storage optimized Sparse Matrix implementations in C#?
You can create a key for a dictionary from the array coordinates. Something like:
int key = x * 46000 + y;
(This naturally works for coordinates resembling an array up to 46000x46000, which is about what you can fit in an int. If you need to represent a larger array, you would use a long value as key.)
With the key you can store and retreive the object in a Dictionary<int, YourClass>. Storing and retrieving values from the dictionary is quite fast, not much slower than using an array.
You can iterate the items in the dictionary, but you won't get them in a predictable order, i.e. not the same as looping the x and y coordinates of an array.
If you need high performance you can roll down your own data structure.
If the objects can be contained in only one container and not moved to other containers, you can do a custom hashset like data structure.
You add X, Y and Next fields into your class.
You make a singly linked list of your object stored in an array that is your hash table.
This can be very very fast.
I wrote it from scratch, there may be bugs.
Clear, and rehash are not implemented, this is a demonstration only.
Complexity of all operation is averaged O(1).
To make easy to enumerate on all nodes skipping empty nodes, there is a doubly linked list. Complexity of insertion and removal from a doubly linked list is O(1), and you will be able to enumerate all nodes skipping unused nodes, so the complexity for enumerating all nodes is O(n) where n is the number of nodes, not the "virtual" size of this sparse matrix.
Using a doubly linked list you can enumerate items in the same order as you insert it.
The order is unrelated to X and Y coordinates.
public class Node
{
internal NodeTable pContainer;
internal Node pTableNext;
internal int pX;
internal int pY;
internal Node pLinkedListPrev;
internal Node pLinkedListNext;
}
public class NodeTable :
IEnumerable<Node>
{
private Node[] pTable;
private Node pLinkedListFirst;
private Node pLinkedListLast;
// Capacity must be a prime number great enough as much items you want to store.
// You can make this dynamic too but need some more work (rehashing and prime number computation).
public NodeTable(int capacity)
{
this.pTable = new Node[capacity];
}
public int GetHashCode(int x, int y)
{
return (x + y * 104729); // Must be a prime number
}
public Node Get(int x, int y)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket]; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
return current;
}
return null;
}
public IEnumerator<Node> GetEnumerator()
{
// Replace yield with a custom struct Enumerator to optimize performances.
for (Node node = this.pLinkedListFirst, next; node != null; node = next)
{
next = node.pLinkedListNext;
yield return node;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public bool Set(int x, int y, Node node)
{
if (node == null || node.pContainer != null)
{
int bucket = (GetHashCode(x, y) & 0x7FFFFFFF) % this.pTable.Length;
for (Node current = this.pTable[bucket], prev = null; current != null; current = current.pTableNext)
{
if (current.pX == x && current.pY == y)
{
this.fRemoveFromLinkedList(current);
if (node == null)
{
// Remove from table linked list
if (prev != null)
prev.pTableNext = current.pTableNext;
else
this.pTable[bucket] = current.pTableNext;
current.pTableNext = null;
}
else
{
// Replace old node from table linked list
node.pTableNext = current.pTableNext;
current.pTableNext = null;
if (prev != null)
prev.pTableNext = node;
else
this.pTable[bucket] = node;
node.pContainer = this;
node.pX = x;
node.pY = y;
this.fAddToLinkedList(node);
}
return true;
}
prev = current;
}
// New node.
node.pContainer = this;
node.pX = x;
node.pY = y;
// Add to table linked list
node.pTableNext = this.pTable[bucket];
this.pTable[bucket] = node;
// Add to global linked list
this.fAddToLinkedList(node);
return true;
}
return false;
}
private void fRemoveFromLinkedList(Node node)
{
Node prev = node.pLinkedListPrev;
Node next = node.pLinkedListNext;
if (prev != null)
prev.pLinkedListNext = next;
else
this.pLinkedListFirst = next;
if (next != null)
next.pLinkedListPrev = prev;
else
this.pLinkedListLast = prev;
node.pLinkedListPrev = null;
node.pLinkedListNext = null;
}
private void fAddToLinkedList(Node node)
{
node.pLinkedListPrev = this.pLinkedListLast;
this.pLinkedListLast = node;
if (this.pLinkedListFirst == null)
this.pLinkedListFirst = node;
}
}
arrays give multiple features:
A way of organizing data as a list of elements
A way to access the data elements by index number (1st, 2nd, 3rd etc)
But a common downside (depends on the language and runtime) is that arrays are often work poorly as a sparse data structure--if you don't need all of the array elements then you end up with wasted memory space.
So, yes, a hashtable will usually save space over an array.
But You asked My main aim is to reduce the number of iterations only by changing the data structure. In order to answer that question, we need to know more about your algorithm--what you're doing in each loop of your program.
For example, there are many ways to sort an array or a matrix. The different algorithms for sorting use differing numbers of iterations.
Related
I have a list of panels, sorted by their y-values. You can see my question from earlier about the specifics of why this is structured this way. Long story short, this List has the highest panel at position 0, the one below it at position 1, etc, down to the last one at the last position. I am accessing the y-coordinate of each panel using this line of code adapted from my linked question:
Panel p = panelList[someIndex];
int panelHeight = p.Top + p.Parent.Top - p.Parent.Margin.Top;
//The above line guarantees that the first panel (index 0) has y-coordinate 0 when scrolled all the way up,
//and becomes negative as the user scrolls down.
//the second panel starts with a positive y-coordinate, but grows negative after the user scrolls past the top of that page
//and so on...
I need to find the index of the panel closest to height 0, so I know which panels are currently on, or very near being on, the page. Therefore, I am trying to use the List.BinarySearch() method, which is where I'm stuck. I'm hoping to take advantage of the BinarySearch's property of returning the index that the value would be at if it did exist in the list. That way I can just search for the panel at height 0 (which I don't expect to find), but find the element nearest it (say at y=24 or y=-5 something), and know that that is the panel being rendered at the moment.
Binary Search lets you specify an IComparer to define the < and > operations, so I wrote this class:
class PanelLocationComparer : IComparer<Panel>
{
public int Compare(Panel x, Panel y)
{
//start by checking all the cases for invalid input
if (x == null && y == null) { return 0; }
else if (x == null && y != null) { return -1; }
else if (x != null && y == null) { return 1; }
else//both values are defined, compare their y values
{
int xHeight = x.Top + x.Parent.Top - x.Parent.Margin.Top;
int yHeight = y.Top + y.Parent.Top - y.Parent.Margin.Top;
if (xHeight > yHeight)
{
return 1;
}
else if (xHeight < yHeight)
{
return -1;
}
else
{
return 0;
}
}
}
}
That doesn't work, and I'm realizing now that it is because comparing two panels for greater than or less than doesn't actually care about what value I'm searching for, in this case y value = 0. Is there a way to implement this in a IComparer, or is there a way to even do this type of search using the built-in BinarySearch?
I considered just making a new List of the same length as my Panel list every time, copying the y-values into it, and then searching through this list of ints for 0, but creating, searching, and destroying that list every time they scroll will hurt performance so much that it defeats the point of the binary search.
My question is also related to this one, but I couldn't figure out how to adapt it because they ultimately use a built-in comparison method, which I don't have access to in this situation.
Unfortunately the built-in BinarySearch methods cannot handle such scenario. All they can do is to search for list item or something that can be extracted from the list item. Someimes they can be used with a fake item and appropriate comparer, but this is not applicable here.
From the other side, binary search is quite simple algorithm, so you can easily create one for your specific case, or better, create a custom extension method in order to not repeat yourself the next time you need something like this:
public static class Algorithms
{
public static int BinarySearch<TSource, TValue>(this IReadOnlyList<TSource> source, TValue value, Func<TSource, TValue> valueSelector, IComparer<TValue> valueComparer = null)
{
return source.BinarySearch(0, source.Count, value, valueSelector, valueComparer);
}
public static int BinarySearch<TSource, TValue>(this IReadOnlyList<TSource> source, int start, int count, TValue value, Func<TSource, TValue> valueSelector, IComparer<TValue> valueComparer = null)
{
if (valueComparer == null) valueComparer = Comparer<TValue>.Default;
int lo = start, hi = lo + count - 1;
while (lo <= hi)
{
int mid = lo + (hi - lo) / 2;
int compare = valueComparer.Compare(value, valueSelector(source[mid]));
if (compare < 0) hi = mid - 1;
else if (compare > 0) lo = mid + 1;
else return mid;
}
return ~lo; // Same behavior as the built-in methods
}
}
and then use simply:
int index = panelList.BinarySearch(0, p => p.Top + p.Parent.Top - p.Parent.Margin.Top);
class Node
{
int number;
Vector2 position;
public Node(int number, Vector2 position)
{
this.number = number;
this.position = position;
}
}
List<Node>nodes = new List<Node>();
for (int i = 0; i < nodes.Count; i++) //basically a foreach
{
// Here i would like to find each node from the list, in the order of their numbers,
// and check their vectors
}
So, as the code pretty much tells, i am wondering how i can
find a specific node from the list, specifically one with the attribute "numbers" being i (Eg going through all of them in the order of their "number" attribute).
check its other attribute
Have tried:
nodes.Find(Node => Node.number == i);
Node test = nodes[i];
place = test.position
they cant apparently access node.number / node.position due to its protection level.
Also the second one has the problem that the nodes have to be sorted first.
Also looked at this question
but [] solution is in the "Tried" caterology, foreach solution doesn't seem to work for custom classes.
I'm a coding newbie (Like 60 hours), so don't
Explain in a insanely hard way.
Say i am dumb for not knowing a this basic thing.
Thanks!
I would add properties for Number and Position, making them available to outside users (currently their access modifier is private):
class Node
{
public Node(int number, Vector2 position)
{
this.Number = number;
this.Position = position;
}
public int Number { get; private set; }
public Vector2 Position { get; private set; }
}
Now your original attempt should work:
nodes.Find(node => node.Number == i);
However, it sounds like sorting the List<Node> and then accessing by index would be faster. You would be sorting the list once and directly indexing the list vs looking through the list on each iteration for the item you want.
List<Node> SortNodes(List<Node> nodes)
{
List<Node> sortedNodes = new List<Node>();
int length = nodes.Count; // The length gets less and less every time, but we still need all the numbers!
int a = 0; // The current number we are looking for
while (a < length)
{
for (int i = 0; i < nodes.Count; i++)
{
// if the node's number is the number we are looking for,
if (nodes[i].number == a)
{
sortedNodes.Add(list[i]); // add it to the list
nodes.RemoveAt(i); // and remove it so we don't have to search it again.
a++; // go to the next number
break; // break the loop
}
}
}
return sortedNodes;
}
This is a simple sort function. You need to make the number property public first.
It will return a list full of nodes in the order you want to.
Also: The searching goes faster as more nodes are added to the sorted nodes list.
Make sure you that all nodes have a different number! Otherwise it would get stuck in an infinite loop!
I'm trying to calculate the shortest paths. This does work with the below pasted implementation of Dijkstra. However I want to speed the things up.
I use this implementation to decide to which field I want to go next. The graph represents an two dimensional array where all fields are connected to each neighbours. But over time the following happens: I need to remove some edges (there are obstacles). The start node is my current position which does also change over time.
This means:
I do never add a node, never add a new edge, never change the weight of an edge. The only operation is removing an edge
The start node does change over time
Questions:
Is there an algorithm wich can do a fast recalculation of the shortest-paths when I know that the only change in the graph is the removal of an edge?
Is there an algorithm wich allows me to fast recalculate the shortest path when the start node changes only to one of it's neighbours?
Is another algorithm maybe better suited for my problem?
Thx for your help
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
public class Dijkstra<T>
{
private Node<T> calculatedStart;
private ReadOnlyCollection<Node<T>> Nodes {
get ;
set ;
}
private ReadOnlyCollection<Edge<T>> Edges {
get;
set;
}
private List<Node<T>> NodesToInspect {
get;
set ;
}
private Dictionary<Node<T>, int> Distance {
get ;
set ;
}
private Dictionary<Node<T>, Node<T>> PreviousNode {
get;
set ;
}
public Dijkstra (ReadOnlyCollection<Edge<T>> edges, ReadOnlyCollection<Node<T>> nodes)
{
Edges = edges;
Nodes = nodes;
NodesToInspect = new List<Node<T>> ();
Distance = new Dictionary<Node<T>, int> ();
PreviousNode = new Dictionary<Node<T>, Node<T>> ();
foreach (Node<T> n in Nodes) {
PreviousNode.Add (n, null);
NodesToInspect.Add (n);
Distance.Add (n, int.MaxValue);
}
}
public LinkedList<T> GetPath (T start, T destination)
{
Node<T> startNode = new Node<T> (start);
Node<T> destinationNode = new Node<T> (destination);
CalculateAllShortestDistances (startNode);
// building path going back from the destination to the start always taking the nearest node
LinkedList<T> path = new LinkedList<T> ();
path.AddFirst (destinationNode.Value);
while (PreviousNode[destinationNode] != null) {
destinationNode = PreviousNode [destinationNode];
path.AddFirst (destinationNode.Value);
}
path.RemoveFirst ();
return path;
}
private void CalculateAllShortestDistances (Node<T> startNode)
{
if (startNode.Value.Equals (calculatedStart)) {
return;
}
Distance [startNode] = 0;
while (NodesToInspect.Count > 0) {
Node<T> nearestNode = GetNodeWithSmallestDistance ();
// if we cannot find another node with the function above we can exit the algorithm and clear the
// nodes to inspect because they would not be reachable from the start or will not be able to shorten the paths...
// this algorithm does also implicitly kind of calculate the minimum spanning tree...
if (nearestNode == null) {
NodesToInspect.Clear ();
} else {
foreach (Node<T> neighbour in GetNeighborsFromNodesToInspect(nearestNode)) {
// calculate distance with the currently inspected neighbour
int dist = Distance [nearestNode] + GetDirectDistanceBetween (nearestNode, neighbour);
// set the neighbour as shortest if it is better than the current shortest distance
if (dist < Distance [neighbour]) {
Distance [neighbour] = dist;
PreviousNode [neighbour] = nearestNode;
}
}
NodesToInspect.Remove (nearestNode);
}
}
calculatedStart = startNode;
}
private Node<T> GetNodeWithSmallestDistance ()
{
int distance = int.MaxValue;
Node<T> smallest = null;
foreach (Node<T> inspectedNode in NodesToInspect) {
if (Distance [inspectedNode] < distance) {
distance = Distance [inspectedNode];
smallest = inspectedNode;
}
}
return smallest;
}
private List<Node<T>> GetNeighborsFromNodesToInspect (Node<T> n)
{
List<Node<T>> neighbors = new List<Node<T>> ();
foreach (Edge<T> e in Edges) {
if (e.Start.Equals (n) && NodesToInspect.Contains (n)) {
neighbors.Add (e.End);
}
}
return neighbors;
}
private int GetDirectDistanceBetween (Node<T> startNode, Node<T> endNode)
{
foreach (Edge<T> e in Edges) {
if (e.Start.Equals (startNode) && e.End.Equals (endNode)) {
return e.Distance;
}
}
return int.MaxValue;
}
}
Is there an algorithm wich can do a fast recalculation of the shortest-paths when I know that the only change in the graph is the removal of an edge?
Is there an algorithm wich allows me to fast recalculate the shortest path when the start node changes only to one of it's neighbours?
The answer to both of these questions is yes.
For the first case, the algorithm you're looking for is called LPA* (sometimes, less commonly, called Incremental A*. The title on that paper is outdated). It's a (rather complicated) modification to A* that allows fast recalculation of best paths when only a few edges have changed.
Like A*, LPA* requires an admissible distance heuristic. If no such heuristic exists, you can just set it to 0. Doing this in A* will essentially turn it into Djikstra's algorithm; doing this in LPA* will turn it into an obscure, rarely-used algorithm called DynamicSWSF-SP.
For the second case, you're looking for D*-Lite. It is a pretty simple modification to LPA* (simple, at least, once you understand LPA*) that does incremental pathfinding as the unit moves from start-to-finish and new information is gained (edges are added/removed/changed). It is primarily used for robots traversing an unknown or partially-known terrain.
I've written up a fairly comprehensive answer (with links to papers, in the question) on various pathfinding algorithms here.
I want to do some performance measuring of a particular method, but I'd like to average the time it takes to complete. (This is a C# Winforms application, but this question could well apply to other frameworks.)
I have a Stopwatch which I reset at the start of the method and stop at the end. I'd like to store the last 10 values in a list or array. Each new value added should push the oldest value off the list.
Periodically I will call another method which will average all stored values.
Am I correct in thinking that this construct is a circular buffer?
How can I create such a buffer with optimal performance? Right now I have the following:
List<long> PerfTimes = new List<long>(10);
// ...
private void DoStuff()
{
MyStopWatch.Restart();
// ...
MyStopWatch.Stop();
PerfTimes.Add(MyStopWatch.ElapsedMilliseconds);
if (PerfTimes.Count > 10) PerfTimes.RemoveAt(0);
}
This seems inefficient somehow, but perhaps it's not.
Suggestions?
You could create a custom collection:
class SlidingBuffer<T> : IEnumerable<T>
{
private readonly Queue<T> _queue;
private readonly int _maxCount;
public SlidingBuffer(int maxCount)
{
_maxCount = maxCount;
_queue = new Queue<T>(maxCount);
}
public void Add(T item)
{
if (_queue.Count == _maxCount)
_queue.Dequeue();
_queue.Enqueue(item);
}
public IEnumerator<T> GetEnumerator()
{
return _queue.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Your current solution works, but it's inefficient, because removing the first item of a List<T> is expensive.
private int ct = 0;
private long[] times = new long[10];
void DoStuff ()
{
...
times[ct] = MyStopWatch.ElapsedMilliseconds;
ct = (ct + 1) % times.Length; // Wrap back around to 0 when we reach the end.
}
Here is a simple circular structure.
This requires none of the array copying or garbage collection of linked list nodes that the other solutions have.
For optimal performance, you can probably just use an array of longs rather than a list.
We had a similar requirement at one point to implement a download time estimator, and we used a circular buffer to store the speed over each of the last N seconds.
We weren't interested in how fast the download was over the entire time, just roughly how long it was expected to take based on recent activity but not so recent that the figures would be jumping all over the place (such as if we just used the last second to calculate it).
The reason we weren't interested in the entire time frame was that a download could so 1M/s for half an hour then switch up to 10M/s for the next ten minutes. That first half hour will drag down the average speed quite severely, despite the fact that you're now downloading quite fast.
We created a circular buffer with each cell holding the amount downloaded in a 1-second period. The circular buffer size was 300, allowing for 5 minutes of historical data, and every cell was initialised to zero. In your case, you would only need ten cells.
We also maintained a total (the sum of all entries in the buffer, so also initially zero) and the count (initially zero, obviously).
Every second, we would figure out how much data had been downloaded since the last second and then:
subtract the current cell from the total.
put the current figure into that cell and advance the cell pointer.
add that current figure to the total.
increase the count if it wasn't already 300.
update the figure displayed to the user, based on total / count.
Basically, in pseudo-code:
def init (sz):
buffer = new int[sz]
for i = 0 to sz - 1:
buffer[i] = 0
total = 0
count = 0
index = 0
maxsz = sz
def update (kbps):
total = total - buffer[index] + kbps # Adjust sum based on deleted/inserted values.
buffer[index] = kbps # Insert new value.
index = (index + 1) % maxsz # Update pointer.
if count < maxsz: # Update count.
count = count + 1
return total / count # Return average.
That should be easily adaptable to your own requirements. The sum is a nice feature to "cache" information which may make your code even faster. By that I mean: if you need to work out the sum or average, you can work it out only when the data changes, and using the minimal necessary calculations.
The alternative would be a function which added up all ten numbers when requested, something that would be slower than the single subtract/add when loading another value into the buffer.
You may want to look at using the Queue data structure instead. You could use a simple linear List, but it is wholly inefficient. A circular array could be used but then you must resize it constantly. Therefore, I suggest you go with the Queue.
I needed to keep 5 last scores in a array and I came up with this simple solution.
Hope it will help some one.
void UpdateScoreRecords(int _latestScore){
latestScore = _latestScore;
for (int cnt = 0; cnt < scoreRecords.Length; cnt++) {
if (cnt == scoreRecords.Length - 1) {
scoreRecords [cnt] = latestScore;
} else {
scoreRecords [cnt] = scoreRecords [cnt+1];
}
}
}
Seems okay to me. What about using a LinkedList instead? When using a List, if you remove the first item, all of the other items have to be bumped back one item. With a LinkedList, you can add or remove items anywhere in the list at very little cost. However, I don't know how much difference this would make, since we're only talking about ten items.
The trade-off of a linked list is that you can't efficiently access random elements of the list, because the linked list must essentially "walk" along the list, passing each item, until it gets to the one you need. But for sequential access, linked lists are fine.
For java, it could be that way
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
public class SlidingBuffer<T> implements Iterable<T>{
private Queue<T> _queue;
private int _maxCount;
public SlidingBuffer(int maxCount) {
_maxCount = maxCount;
_queue = new LinkedList<T>();
}
public void Add(T item) {
if (_queue.size() == _maxCount)
_queue.remove();
_queue.add(item);
}
public Queue<T> getQueue() {
return _queue;
}
public Iterator<T> iterator() {
return _queue.iterator();
}
}
It could be started that way
public class ListT {
public static void main(String[] args) {
start();
}
private static void start() {
SlidingBuffer<String> sb = new SlidingBuffer<>(5);
sb.Add("Array1");
sb.Add("Array2");
sb.Add("Array3");
sb.Add("Array4");
sb.Add("Array5");
sb.Add("Array6");
sb.Add("Array7");
sb.Add("Array8");
sb.Add("Array9");
//Test printout
for (String s: sb) {
System.out.println(s);
}
}
}
The result is
Array5
Array6
Array7
Array8
Array9
Years after the latest answer I stumbled on this questions while looking for the same solution. I ended with a combination of the above answers especially the one of: cycling by agent-j and of using a queue by Thomas Levesque
public class SlidingBuffer<T> : IEnumerable<T>
{
protected T[] items;
protected int index = -1;
protected bool hasCycled = false;
public SlidingBuffer(int windowSize)
{
items = new T[windowSize];
}
public void Add(T item)
{
index++;
if (index >= items.Length) {
hasCycled = true;
index %= items.Length;
}
items[index] = item;
}
public IEnumerator<T> GetEnumerator()
{
if (index == -1)
yield break;
for (int i = index; i > -1; i--)
{
yield return items[i];
}
if (hasCycled)
{
for (int i = items.Length-1; i > index; i--)
{
yield return items[i];
}
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
I had to forego the very elegant one-liner of j-agent: ct = (ct + 1) % times.Length;
because I needed to detect when we circled back (through hasCycled) to have a well behaving enumerator. Note that the enumerator returns values from most-recent to oldest value.
Please help I've been trying to generate a random binary search tree of size 1024 and the elements needs to be random sortedset ... I'm able to write a code to create a binary search tree manually by adding elements manually but I'm unablele yo write a code that would generate a random balanced binary tree of size 1024 then use try to find a key in that tree ... please please and thank u ahead ....
Edit added code from comments
ya it is homework... and this is what i got so far as code:
using System;
namespace bst {
public class Node {
public int value;
public Node Right = null;
public Node Left = null;
public Node(int value)
{
this.value = value;
}
}
public class BST {
public Node Root = null;
public BST() { }
public void Add(int new_value)
{
if(Search(new_value))
{
Console.WriteLine("value (" + new_value + ") already");
}
else
{
AddNode(this.Root,new_value);
}
}
}
}
Use recursion.
Each branch generates a new branch, select the middle item in the unsorted set, the median. Put it in the current item in the tree. Copy all items less than the median to another array, send that new array to the call of the same method. Copy all items greater than the median to another array, send that new array to the call of the same method.\
Balanced trees have to have an odd number of items, unless the main parent node is not filled in. You need to decide if there are two values that are the Median, whether the duplicate belongs on the lower branch or upper branch. I put duplicates on the upper branch in my example.
The median will be the number where an equal amount of numbers is less than and greater than the number. 1,2,3,3,4,18,29,105,123
In this case, the median is 4, even though the mean (or average) is much higher.
I didn't include code that determines the median.
BuildTreeItem(TreeItem Item, Array Set)
{
Array Smalls;
Array Larges;
Median = DetermineMedian(Set);
Item.Value = Median;
if(Set.Count() == 1)
return;
for (int i = 0; int i < Set.Count(); i++)
{
if(Set[i] < Median)
{
Smalls.new(Set[i]);
}
else
{
Larges.new(Set[i]);
}
}
Item.Lower = new TreeItem;
Item.Upper = new TreeItem;
BuildTreeItem(TreeItem.Lower, Smalls);
BuildTreeItem(TreeItem.Upper, Larges);
}
Unless it is homework the easiest solution would be to sort data first and then build a tree by using middle item as root and descending down each half. Method proposed by Xaade is similar , but much slower due to DetermineMedian complexity.
The other option is to actually look at algorithms that build balanced trees (like http://en.wikipedia.org/wiki/Red-black_tree ) to see if it fits your requirements.
EDIT: removing incorrect statement about speed of Xaade algorithm - it is actually as fast as quick sort (n log n - check each element on every level of recursion with log n levels of recursion), not sure why I estimated it slower.