Predicting the amount of parents in a binary tree

Predicting the amount of parents in a binary tree - c#

I'm trying to predict the amount of parents in a Binary tree given that it all you know is the amount of leaves, and its a balanced binary tree.
Currently, my code runs like this:
int width = exits;
int amountOfParents = 0;
do
{
width -= 2;
AmountOfParents++;
} while (width > 0);
The basic premise of the code is that it will take all the child, and find the number of parents for them. Do this iteratively until you reach the root. However, the problem comes in when the height of the tree is uneven.
This solution gives correct number of parents up till 5. When it hits 6, the binaray tree creates another parent node, so there should be 4, but it gives 3. I know why it gives 3, but I don't know how to fix it.
Edit: I just had another idea. What if I find the closest perfect square number perfectly balanced tree, and than individually find the unaccounted? Trying now.

The formula is log2(exits) * 2 + 1
C#: Math.Ceiling(Math.Log(x) / Math.Log(2)) * 2 + 1;
But it has to be perfectly balanced indeed
So since I'm doing the inverse of square numbers, your idea of the square numbers could work.

I tried it with several different methods, but this seems to be the best one.
You take the all the child and group them into pairs. Put the pairs into a list and do the same thing again. If there is an odd number of pairs, just put him into the list. He will get handled in later iterations due to the nature of it.

Related

Efficient approach to get all elements within a range in a 2d array?

Consider the following 2 dimensional jagged array
[0,0] [0,1] [0,2]
[1,0] [1,1]
[2,0] [2,1] [2,2]
Lets say I want to know all elements that fall within a certain range for example [0,0]-[2,0], I would like a list of all of those elements, what would be the best approach for achieving this and are there any pre-existing algorithms that achieve this?
I have attempted to implement this in C# but didn't get much further than some for loops.
The example below better provides further detail upon what I would like to achieve.
Using the array defined above, lets say the start index is [0,0] and the end index of the range is [2,1]
I would like to create a method that returns the values of all the indexes that fall within this range.
Expected results would be for the method to return the stored values for the following index.
[0,0] [0,1] [0,2] [1,0] [1,1] [2,0] [2,1]

If the 2d array is "sorted", meaning that as you go from left to right in each 1d array the y increases and as you go from up to down the x increases, you can find the first point and the last point that you need to report using binary searches in total time of O(logn), and after that report every point between those 2 points in O(k) where k is the number of points that you need to report (notice that you the tome complexity will be Omega(k) in every algorithm).
If the 2D array is not sorted and you just want to output all the pairs between pair A and pair B:
should_print = False
should_stop = False
for i in range(len(2dArray)):
for j in range(len(2dArray[i]))
should_print = (should_print or (2dArray[i][j] == A))
if should_print:
print(2dArray[i][j])
should_stop = (2dArray[i][j] == B)
if should_stop:
break
if should_stop:
break
If you just have n general 2d points and you wish to answer the query "find me all the points in a given rectangle", there are 2 data structures that can help you - kd trees and range trees. These 2 data structures provide you a good query time but they are a bit complicated. I am not sure what is your current level but if you are just starting to get into DS and algorithms these data structures are probably an overkill.
Edit (a bit about range trees and kd trees):
First I'll explain the basic concept behind range trees. Lets start by trying to answer the range query for 1D points. This is easy - just build a bst (balanced search tree) and with it you can answer queries in O(logn + k) where k is the number of points being reported. It takes O(nlogn) time to build the bst and it take O(n) space.
Now, let us try to take this solution and make it work for 2D. We will build a bst for the x coordinates of the points. For each node t in the bst denote all the points in the subtree of t by sub(t). Now, for every node t we will build a bst for the y coordinates of the points sub(t).
Now, given a range, we will for find all the subtrees contained in the x range using the first bst, and for each subtree we will find all the points contained in the y range (note that the bst corresponding the the subtree of sub(t) is saved at the node t).
A query takes O(log^2n) time. Building the DS takes O(nlog^2n) time and finally, it takes O(nlogn) space. I'll let you prove these statements. With more work the query time can be reduced to O(logn) and the building time can be reduced to O(nlogn). You can read about it here: http://www.cs.uu.nl/docs/vakken/ga/2021/slides/slides5b.pdf.
Now a word about KD trees. The idea is to split the 2D sapce in the middle with a vertical line, after that split each side in the middle with a horizontal line and so on. The query time in this DS will take O(sqrt(n)), the building time is O(nlogn) and the space it takes is O(n). You can read more about this DS here: http://www.cs.uu.nl/docs/vakken/ga/2021/slides/slides5a.pdf

Implementation of array with negative indices

I am making a game with a world that extends infinitely in every direction. This means that you can be at position X:50, Y:50 or X:-50, Y:-50. But... I can't really do that with a normal C# List...
All the ideas I've come up with seem to be too complicated/inefficient to work...

The easiest way to implement infinite grid is using a sparse matrix with a dictionary with an x,y pair as the key and the data you want to store as the values. This is fast, easy to implement, and memory friendly if your grid is sparse.
Another way is a linked grid (similar to linked list, but with pointers to 4 directions), or a tile-based approach to reduce the overhead of linked grid (a tile is a linked grid of NxN arrays). Implementation of tiles is quite complicated, but is a good tradeoff between memory and performance for very dense grids.
But my personal favorite approach is to use the even-odd transformation. So odd indices are positive, while even numbers are negative. To transform from virtual index to the physical index, you use the formula p = abs(v * 2) - (v > 0 ? 1 : 0) and to convert physical to virtual index you do v = (p % 2 == 1 ? +1 : -1) * ((2*p + 3) / 4). This relation arises because there is one to one and onto relation (bijection) between natural numbers and integers (0 <-> 0), (1 <-> 1), (2 <-> -1), (3 <-> 2), (4 <-> -2), (5 <-> 3), (6 <-> -3), .... This approach is fast, simple and elegant, but not very great memory wise when you have very sparse grid with items extremely far from the center line.

Unless you have a TON (yes, a TON of bits...) of cells, you can use dictionaries. Combine that with a System.Drawing.Point as the key, and you get a good thing going on:
Dictionary<Point,YourGridObject> myMap = new Dictionary<Point,YourGridObject>();
Edit: In addition to the dictionary, each cell can have a reference to it's adjacent cells, this way you can use the dictionary to directly go "somewhere", but then navigate with the adjacent. I used that way to implement an A* pathfinding algorithm in an hex grid.
Edit 2:
For example, if you then want to access a specific coordinate, you can simply
var myTile = myMap[new Point(25, -25)];
Then, you want to get the East tile, you can
var eastTile = myTile.East;
Your grid object could also implement an offset method so you could get the 'West 2, North 5' tile by
var otherTile = myTile.Offset(-2, 5);

How about using two List underneath for expansions in two different directions?

I'm not certain if this is more complicated than you want to deal with, but have you considered using polar coordinates instead of cartesian? There are no negative numbers in that coordinate system. I realize that the coversion is difficult at first, but once you wrap your head around it, it becomes second nature.

You could use Dictionary, which has all the capability of an array except with negative indexes obviously.

Computers cannot store infinite arrays.
There must be a boundary to your array, remind that somewhere in code you declared a specific size during initialization of your array.
Perhaps you resize it somewhere, but that still leaves an number range from 0..to.. max.
So what you should do, write a function that allows for relatively positioning in such a map. So you store your current map[x,y] as a position.
And your able to go up, by having a function that add/substracts from your current position relativly.
This keeps your code easier to understand too.
If your not dealing with game maps but number ranges, lets say vectors
you could create a list of n points, or a 2d dictionary.
I'm posting it here, cause your problem might lead people to writing wrong code.
Also adding for other people in situations where there is a border around a map (typical in games scenario, and image manipulation.
where your data goes from [-1..width+1] just dimension it as [0,width+2]
then loop trough it starting 'for (int x = 1; x < Width+1; x++)'

Alternatives to the Dynamic Time Warping (DTW) method

I am doing some research into methods of comparing time series data. One of the algorithms that I have found being used for matching this type of data is the DTW (Dynamic Time Warping) algorithm.
The data I have, resemble the following structure (this can be one path):
Path Event Time Location (x,y)
1 1 2:30:02 1,5
1 2 2:30:04 2,7
1 3 2:30:06 4,4
...
...
Now, I was wondering whether there are other algorithms that would be suitable to find the closest match for the given path.

The keyword you are looking for is "(dis-)similarity measures".
Euclidean Distance (ED) as referred to by Adam Mihalcin (first answer) is easily computable and somehow reflects the natural understanding of the word distance in natural language. Yet when comparing two time series, DTW is to be preffered - especially when applied to real world data.
1) ED can only be applied to series of equal length. Therefore when points are missing, ED simply is not computable (unless also cutting the other sequence, thus loosing more information).
2) ED does not allow time-shifting or time-warping opposed to all algorithms which are based on DTW.
Thus ED is not a real alternative to DTW, because the requirements and restrictions are much higher. But to answer your question, I want to recommend to you this lecture:
Time-series clustering – A decade review
Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, Teh Ying Wah
http://www.sciencedirect.com/science/article/pii/S0306437915000733
This paper gives an overview about (dis-)similarity measures used in time series clustering. Here a little excerpt to motivate your actually reading the paper:

If two paths are the same length, say n, then they are really points in an 2n-dimensional space. The first location determines the first two dimensions, the second location determines the next two dimensions, and so on. For example, if we just take the three points in your example, the path can be represented as the single 6-dimensional point (1, 5, 2, 7, 4, 4). If we want to compare this to another three-point path, we can compute either the Euclidean distance (square root of the sum of squares of per-dimension distances between the two points) or the Manhattan distance (sum of the per-dimension differences).
For example, the boring path that stays at (0, 0) for all three times becomes the 6-dimensional point (0, 0, 0, 0, 0, 0). Then the Euclidean distance between this point and your example path is sqrt((1-0)^2 + (5-0)^2 + (2-0)^2 + (7-0)^2 + (4-0)^2 + (4-0)^2) = sqrt(111) = 10.54. The Manhattan distance is abs(1-0) + abs(5-0) + abs(2-0) + abs(7-0) + abs(4-0) + abs(4-0) = 23. This kind of a difference between the metrics is not unusual, since the Manhattan distance is provably at least as great as the Euclidean distance.
Of course one problem with this approach is that not all paths will be of the same length. However, you can easily cut off the longer path to the same length as the shorter path, or consider the shorter of the two paths to stay at the same location or moving in the same direction after measurements end, until both paths are the same length. Either approach will introduce some inaccuracies, but no matter what you do you have to deal with the fact that you are missing data on the short path and have to make up for it somehow.
EDIT:
Assuming that path1 and path2 are both List<Tuple<int, int>> objects containing the points, we can cut off the longer list to match the shorter list as:
// Enumerable.Zip stops when it finishes one of the sequences
List<Tuple<int, int, int, int>> matchingPoints = Enumerable.Zip(path1, path2,
(tupl1, tupl2) =>
Tuple.Create(tupl1.Item1, tupl1.Item2, tupl2.Item1, tupl2.Item2));
Then, you can use the following code to find the Manhattan distance:
int manhattanDistance = matchingPoints
.Sum(tupl => Math.Abs(tupl.Item1 - tupl.Item3)
+ Math.Abs(tupl.Item2 - tupl.Item4));
With the same assumptions as for the Manhattan distance, we can generate the Euclidean distance as:
int euclideanDistanceSquared = matchingPoints
.Sum(tupl => Math.Pow(tupl.Item1 - tupl.Item3, 2)
+ Math.Pow(tupl.Item2 - tupl.Item4, 2));
double euclideanDistance = Math.Sqrt(euclideanDistanceSquared);

There's another question here that might be of some help. If you already have a given path, you can find the closest match by using the cross-track distance algorithm; on the other hand, if you actually want to solve the pattern-recognition problem, you might want to find out more about Levenshtein distance and Elastic Matching (from Wikipedia: "Elastic matching can be defined as an optimization problem of two-dimensional warping specifying corresponding pixels between subjected images".

algorithms for tournament brackets (NCAA, etc.)

I'm trying implement a bracket in my program (using C#/.NET MVC) and I am stuck trying to figure out some algorithm.
For example, I have a bracket like this with 8 entries (A,B,C,D,E,F,G,H)
I'm trying to figure out if there's an algorithmic way to
depending on # of entries, find # of
games per round
depending on # of entries, for a
specific game #, what is the
corresponding game # in the next
round?
For example, in this case, for 8 entries, the example are:
for round 1, there are 4 games. Round 2, 2 games. Round 3, 1 game
game 2 in round 1 corresponds to game 5 in round 2.
I also thought about storing this info in a table, but it seems overkill since it never changes, but here it is anyway:
Any help will be greatly appreciated!
Cheers,
Dean

C# code for the first part of your question:
// N = Initial Team Count
// R = Zero-Based Round #
// Games = (N / (2 ^ R)) / 2
public double GamesPerRound(int totalTeams, int currentRound) {
var result = (totalTeams / Math.Pow(2, currentRound)) / 2;
// Happens if you exceed the maximum possible rounds given number of teams
if (result < 1.0F) throw new InvalidOperationException();
return result;
}
The next step in solving part (2) is to know the minimum game number for a given round. An intuitive way to do that would be through a for loop, but there's probably a better method:
var totalTeams = 8;
var selectedRound = 2;
var firstGame = 1;
// If we start with round 1, this doesn't execute and firstGame remains at 1
for (var currentRound = 1; currentRound < selectedRound; currentRound++) {
var gamesPerRound = GamesPerRound(totalTeams, currentRound);
firstGame += gamesPerRound;
}

Quoting #Yuck who answered the first question perfectly.
C# code for the first part of your question:
// N = Initial Team Count
// R = Zero-Based Round #
// Games = (N / (2 ^ R)) / 2
public double GamesPerRound(int totalTeams, int currentRound) {
var result = (totalTeams / Math.Pow(2, currentRound)) / 2;
// Happens if you exceed the maximum possible rounds given number of teams
if (result < 1.0F) throw new InvalidOperationException();
return result;
}
Moving on to the second question:
//G = current game.
//T = total teams
//Next round game = (T / 2) + RoundedUp(G / 2)
//i. e.: G = 2, T = 8
//Next round game = (8 / 2) + RoundedUp(2 / 2) = 5
public int NextGame(int totalTeams, int currentGame) {
return (totalTeams / 2) + (int)Math.Ceiling((double)currentGame / 2);
}

I actually worked this out myself fairly recently and stumbled on (that is, I worked it out, but it has probably been discovered before) a neat recursive solution.
You start with your list of players, in a list that is sorted in seed order. This will be important later.
The overall algorithm consists of splitting the list of players into two and then creating two sub-tournaments. The winners of the two sub-tournaments will end up the grand final of the overall tournament.
Required Objects
Player
Name
Seed
Match
Home Player
Away Player
Next Match (pointer to the Match the winner goes to)
Splitting the Lists
Most tournaments put the top seeded player against the bottom seeded player in round one. In order to do this I used the following algorithm, but you could just put the first n / 2 players in one list and the rest in the other list to create a tournament where seeds 1 and 2 play off in round one (and seed 3 plays 4, 5 plays 6 etc).
I'll note here that the neat thing about having top seed play bottom seed is that with this algorithm if you don't have a power of two number of players, the top seed(s) will get a bye in the early rounds.
Take the first player and put them in the "left" list
Take the next two players (or the last player) and put them in the "right" list
Take the next two player and put them in the "left" list
Repeat from step 2 until there are no more players.
Of course, if there are only two players in the list you simply create a match between them and return.
Building the Tournament
So you start out with a list of say, 64 players. You split it into two lists of 32 players and recursively create two sub tournaments. The methods that you call recursively should return the Matches that represent the sub-tournament's grand final match (the semi-final of your overall tournament). You can then create a match to be the grand final of your overall tournament and set the nextMatch of the semi final matches to be this grand final.
Things to Consider
You'll need to break out of recursing if there are only two players in the list that's passed in.
If your split gives you a list of one, shouldn't recurse with it. Just create a sub-tournament with the other list (it should only have two players so will return with a match immediately), set the home team to be the single player and the nextMatch of the sub-tournament.
If you want to be able to keep track of rounds, you'll need to pass a recursion depth integer - increment it when you create a sub-tournament.
Hope this helps, let me know if you need any clarification :)

So basically its a elimination contest.
So just have List.
The algorithm will always put the first and second teams together if the number of teams is even. You then increase the counter by two and repeat.
If the number of teams is odd do pretty much the samething except you randomly select a winner of the "first around" and put it against the odd team.
After the first round you repeat the algorithm the same way.
A+1
C+1
...
For example, I have a bracket like
this with 8 entries (A,B,C,D,E,F,G,H)
You should be able to figure out how to parse this. This seems like a homework question.

Consider renumbering the games (you can always renumber them back afterwards)
if the final is 1
semis are 2,3
the problem is then has well-published solutions: ahnentafel (German for ancestor table) has been used for a long time by genealogists - http://en.wikipedia.org/wiki/Ahnentafel
one interesting part of this is the binary notation of the game # gives a lot of information as to the structure of the tournament and where in the tree the match is.
Also note that as every match knocks out 1 competitor, for n competitors there will be n-1 matches

Getting the distance between geographically placed objects

I have an application (cad like editor) in which the user may place a lot of positional nodes (defined by their latitude and longitude). The application needs to tell the user if these nodes are very close together - a common design error is to overlap them. What the app is doing is something rather like ReSharper or CodeRush does in the IDE - parses the project and reports problems. A project may have several thousand of these nodes so testing for overlap becomes exponetially more time consuming as the numbers go. At the moment I am using two for loops to get through the list.
for (int i = 0; i < count - 1; i++) {
for (int j = i + 1; j < count; j++) {
// check the distance between the i and j elements and if less than
// a predetermined value report it
}
}
I would like to get the process of identifying these into 'real time' so that the user is told at once in the event that an overlap occurs. The looping is expensive as is the testing of the distance between nodes. Most of the comparisons are wasted of course.
I just wonder if there is another way. I have thought about sorting the list by lat and lon and comparing adjacent elements but I suspect that will not work nor necessarily be faster.
My other thought is to move the activity to another thread (I really have no experience of using multiple threads) so that the values are being updated - for example storing a reference to nearby nodes in each object. However I can imagine needing to clone the object tree for the background thread so that there is no conflict with the foreground.

You could look into Tessealtion.
Executing this on another Thread is a completely separate issue, you could do that with your nested loops as well as with a more efficient algorithm.

Since the user is placing these locations, the ideal solution would be to assume all previously placed points are not nearby, and with each new point or moved point, check against the rest -- this becomes a single for loop, which should be reasonable.
Another, more ideal solution however:
Let position of A be lat, lng.
Convert both lat and lng to a fixed-length representation
(truncating the degree of precision to a value below which
you are sure they will overlap)
Let xa = lat . lng (concatenate both)
Let ya = lng . lat
Given some position B, find xb, yb.
B is 'close' to A iff | xa - xb | < delta, | ya - yb | < delta.
So what you can do is sort the values of xi, yi for all your input points. When you want to check for points that are too close, you need to traverse through the list of xi's to find points that are too close together (linear time, since they will be adjacent) and then check if the same points in the list of yi's are too close.

Thanks to the insights given here I think I have found a reasonable way to deal with this.
*First to sort all the points in latitude order.
*Then loop through the sorted list.
*For each entry in the list compare the latitude offset (distance in meters) to the entries below it.
*If the latitude offset is less than the test distance then check the actual distance from point to point and if it is inside the test radius report the issue.
*When the latitude offset exceeds the test distance move on to the next point in the list and repeat the tests.
I have not written the code yet but it seems to me that if there are no overlaps then I will make a single pass. Since overlaps are rare in practice then the total number of tests is likely to be quite small. For 1000 points the current method makes 500,000 tests. The new method is unlikely to make more than a few thousand.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.