NegaScout with Zobrist Transposition Tables in Chess

NegaScout with Zobrist Transposition Tables in Chess - c#

I'm trying to put Transposition tables into my alpha beta scout. I do see an incremental speed boost I think toward mid or late game, however, even with a table size of 1-2GB, its may or may not be slower than just not reading from the Transpose table at all. I'm also noticing some less than efficient moves if I were to play the exact same game without the tables.
I tested my Zobrist key hashing, and they come out properly even after making and undoing moves. I don't believe they are the issue. I tried to follow the advice of these articles in designing the alpha/beta pruning. http://web.archive.org/web/20070809015843/http://www.seanet.com/~brucemo/topics/hashing.htm http://mediocrechess.blogspot.com/2007/01/guide-transposition-tables.html
Can anyone help me identify a mistake? Perhaps I'm not understanding the evaluation of checking alpha vs beta from the hash. Or is 1-2GB too small to make a difference? I can post more of the Transposition table code if need be.
// !!!! With or without this specific section, and any other Transpose.Insert, doesn't make the game play or evaluate any faster.
HashType type = HashType.AlphaPrune;
HashEntry h = Transpose.GetInstance().Get(board.zobristKey);
if (h != null)
{
if (h.depth >= depth)
{
if (h.flag == HashType.ExactPrune)
{
return h.scored;
}
if (h.flag == HashType.BetaPrune)
{
if(h.scoredState < beta)
{
beta = h.scored;
}
}
if (h.flag == HashType.AlphaPrune)
{
if(h.scoredState > alpha)
{
alpha = h.scored;
}
}
if (alpha >= beta)
{
return alpha;
}
}
}
if (board.terminal)
{
int scoredState = board.Evaluate(color);
Table.GetInstance().Add(board.zobristKey, depth, Entry.EXACT, scoredState);
return scoredState;
}
//May do Quescience search here if necessary && depth = 0
Stack movesGenerated = GeneratePossibleMoves();
while(!movesGenerated.isEmpty())
{
int scoredState = MAXNEGASCOUT;
board.MakeMove(movesGenerated.pop());
int newAlpha = -(alpha +1)
scoredState = -alphaBetaScout(board, depth - 1, newAlpha, -alpha, !color, quiscence);
if (scoredState < beta && alpha < scoredState)
{
scoredState = -alphaBetaScout(board, depth - 1, -beta, -scoredState, !color, quiscence);
}
board.UndoMove();
if (scoredState >= beta)
{
Table.GetInstance().Add(key, depth, Entry.BETA, beta);
return scoredState;
}
if (scoredState > alpha)
{
type = HashType.ExactPrune;
alpha = scoredState;
}
}
Table.GetInstance().Add(key, depth, type, alpha);
return alpha;

I believe you need to make a copy of your alpha and beta bounds before you search your table at the beginning. When you update your bounds (with your table or by searching) these copies do not change.
Then, when you add new entries into your transposition table, you should compare scoredState to the bounds saved in the table (that is, the copies of the bounds you made at the start) instead of comparing it to the updated bounds, because the updated bounds are not the ones stored in the table but the backups are!

Related

Principal variation in chess engine - How could I rewrite this C code into C#?

I need help rewriting this code (the blue parts in the link) to C#. I have little experience with programming in C or C++, so I am not sure how exactly should I interpret the tagMove structure and -> operator in C#.
http://web.archive.org/web/20040427013839/brucemo.com/compchess/programming/pv.htm
If somebody doesn't want to click the link, I also posted the code here:
typedef struct LINE
{
int cmove; // Number of moves in the line.
MOVE argmove[moveMAX]; // The line.
}
LINE;
int AlphaBeta(int depth, int alpha, int beta, LINE * pline)
{
LINE line;
if (depth == 0) {
pline->cmove = 0;
return Evaluate();
}
GenerateLegalMoves();
while (MovesLeft()) {
MakeNextMove();
val = -AlphaBeta(depth - 1, -beta, -alpha, &line);
UnmakeMove();
if (val >= beta) return beta;
if (val > alpha) {
alpha = val;
pline->argmove[0] = ThisMove();
memcpy(pline->argmove + 1, line.argmove, line.cmove * sizeof(MOVE));
pline->cmove = line.cmove + 1;
}
}
return alpha;
}

You are mentioning that you don't know what the "->" operator does in C#, there it is just a "." if something contains a property you access it with a simple dot, lets say that we have an object country with some values, to access each one:
Country.Capital
Country.President
etc...
now, the problem with your question is that it is extremely basic, and there is nothing wrong with that, we all started somewhere, but start working in a chess engine at your level does feels odd, not sure if someone is going to take the time to rewrite that for you , but you will 100% encounter other problems along the way that you will not know how to solve, my advice would be to leave this on hold, focus on the basics and then come back and smash that code with your brand new knowledge

Getting N x N dimension data from quad tree is very slow in c#

I am using quad-tree structure for my data processing application in c#, it is similar to hashlife algorithm. Getting data N x N (eg. 2000 x 2000) dimension data from quad-tree is very very slow.
how can i optimize it for extracting large data from quad tree.
Edit:
Here is the code i used to extract the data in recursive manner
public int Getvalue(long x, long y)
{
if (level == 0)
{
return value;
}
long offset = 1 << (level - 2);
if (x < 0)
{
if (y < 0)
{
return NW.Getvalue(x + offset, y + offset);
}
else
{
return SW.Getvalue(x + offset, y - offset);
}
}
else
{
if (y < 0)
{
return NE.Getvalue(x - offset, y + offset);
}
else
{
return SE.Getvalue(x - offset, y - offset);
}
}
}
outer code
int limit = 500;
List<int> ExData = new List<int>();
for (int row = -limit; row < limit; row++)
{
for (int col = -limit; col < limit; col++)
{
ExData.Add(Root.Getvalue(row, col));
//sometimes two dimension array
}
}

A quadtree or any other structure isn't going to help if you're going to visit every element (i.e. level 0 leaf node). Whatever code gets the value in a given cell, an exhaustive tour will visit 4,000,000 points. Your way does arithmetic over and over again as it goes down the tree at each visit.
So for element (-limit,-limit) the code visits every tier and then returns. For the next element it visits every tier and then returns and so on. That is very labourious.
It will speed up if you make the process of adding to the list itself recursively visiting each quadrant once.
NB: I'm not a C# programmer so please correct any errors here:
public void AppendValues(List<int> ExData) {
if(level==0){
ExData.Add(value);
} else{
NW.AppendValues(ExData);
NE.AppendValues(ExData);
SW.AppendValues(ExData);
SE.AppendValues(ExData);
}
}
That will append all the values though not in the raster-scan (row-by-row) order of the original code!
A further speed up can be achieved if you are dealing with sparse data. So if in many cases nodes are empty or even 'solid' (all zero or one value) you could set the nodes to null and then use zero or the solid value.
That trick works well in Hashlife for Conway Life but depends on your application. Interesting patterns have large areas of 'dead' cells that will always propagate to dead and rarely need considering in detail.
I'm not sure what 25-40% means as 'duplicates'. If they aren't some fixed value or are scattered across the tree large 'solid' regions are likely to be rare and that trick may not help here.
Also, if you actually need to only get the values in some region (e.g. rectangle) you need to be a bit cleverer about how you work out which sub-region of each quadrant you need using offset but it will still be far more efficient than 'brute' force tour of every element. Make sure the code realises when the region of interest is entirely outside the node in hand and return quickly.
All this said if creating a list of all the values in the quad-tree is a common activity in your application, a quad-tree may not be the answer you need. A map simply mapping (row,col) to value is pre-made and again very efficient if there is some common default value (e.g. zero).
It may help to create an iterator object rather than add millions of items to a list; particularly if the list is transient and destroyed soon after.
More information about the actual application is required to understand if a quadtree is the answer here. The information provided so far suggests it isn't.

GLSL Spinlock Blocking Forever

I am trying to implement a Spinlock in GLSL. It will be used in the context of Voxel Cone Tracing. I try to move the information, which stores the lock state, to a separate 3D texture which allows atomic operations. In order to not waste memory I don't use a full integer to store the lock state but only a single bit. The problem is that without limiting the maximum number of iterations, the loop never terminates. I implemented the exact same mechanism in C#, created a lot of tasks working on shared resources and there it works perfectly.
The book Euro Par 2017: Parallel Processing Page 274 (can be found on Google) mentions possible caveats when using locks on SIMT devices. I think the code should bypass those caveats.
Problematic GLSL Code:
void imageAtomicRGBA8Avg(layout(RGBA8) volatile image3D image, layout(r32ui) volatile uimage3D lockImage,
ivec3 coords, vec4 value)
{
ivec3 lockCoords = coords;
uint bit = 1<<(lockCoords.z & (4)); //1<<(coord.z % 32)
lockCoords.z = lockCoords.z >> 5; //Division by 32
uint oldValue = 0;
//int counter=0;
bool goOn = true;
while (goOn /*&& counter < 10000*/)
//while(true)
{
uint newValue = oldValue | bit;
uint result = imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
//Writing is allowed if could write our value and if the bit indicating the lock is not already set
if (result == oldValue && (result & bit) == 0)
{
vec4 rval = imageLoad(image, coords);
rval.rgb = (rval.rgb * rval.a); // Denormalize
vec4 curValF = rval + value; // Add
curValF.rgb /= curValF.a; // Renormalize
imageStore(image, coords, curValF);
//Release the lock and set the flag such that the loops terminate
bit = ~bit;
oldValue = 0;
while (goOn)
{
newValue = oldValue & bit;
result = imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue)
goOn = false; //break;
oldValue = result;
}
//break;
}
oldValue = result;
//++counter;
}
}
Working C# code with identical functionality
public static void Test()
{
int buffer = 0;
int[] resource = new int[2];
Action testA = delegate ()
{
for (int i = 0; i < 100000; ++i)
imageAtomicRGBA8Avg(ref buffer, 1, resource);
};
Action testB = delegate ()
{
for (int i = 0; i < 100000; ++i)
imageAtomicRGBA8Avg(ref buffer, 2, resource);
};
Task[] tA = new Task[100];
Task[] tB = new Task[100];
for (int i = 0; i < tA.Length; ++i)
{
tA[i] = new Task(testA);
tA[i].Start();
tB[i] = new Task(testB);
tB[i].Start();
}
for (int i = 0; i < tA.Length; ++i)
tA[i].Wait();
for (int i = 0; i < tB.Length; ++i)
tB[i].Wait();
}
public static void imageAtomicRGBA8Avg(ref int lockImage, int bit, int[] resource)
{
int oldValue = 0;
int counter = 0;
bool goOn = true;
while (goOn /*&& counter < 10000*/)
{
int newValue = oldValue | bit;
int result = Interlocked.CompareExchange(ref lockImage, newValue, oldValue); //imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue && (result & bit) == 0)
{
//Now we hold the lock and can write safely
resource[bit - 1]++;
bit = ~bit;
oldValue = 0;
while (goOn)
{
newValue = oldValue & bit;
result = Interlocked.CompareExchange(ref lockImage, newValue, oldValue); //imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if (result == oldValue)
goOn = false; //break;
oldValue = result;
}
//break;
}
oldValue = result;
++counter;
}
}
The locking mechanism should work quite identical as the one described in OpenGL Insigts Chapter 22 Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer by Cyril Crassin and Simon Green. They just use integer textures to store the colors for every voxel which I would like to avoid because this complicates Mip Mapping and other things.
I hope the post is understandable, I get the feeling it is already becoming too long...
Why does the GLSL implementation not terminate?

If I understand you well, you use lockImage as thread-lock: A determined value at determined coords means "only this shader instance can do next operations" (change data in other image at that coords). Right.
The key is imageAtomicCompSwap. We know it did the job because it was able to store that determined value (let's say 0 means "free" and 1 means "locked"). We know it because the returned value (the original value) is "free" (i.e. the swap operation happened):
bool goOn = true;
unit oldValue = 0; //free
uint newValue = 1; //locked
//Wait for other shader instance to free the simulated lock
while ( goON )
{
uint result = imageAtomicCompSwap(lockImage, lockCoords, oldValue, newValue);
if ( result == oldValue ) //it was free, now it's locked
{
//Just this shader instance executes next lines now.
//Other instances will find a "locked" value in 'lockImage' and will wait
...
//release our simulated lock
imageAtomicCompSwap(lockImage, lockCoords, newValue, oldValue);
goOn = false;
}
}
I think your code loops forever because you complicated your life with bitvar and did a wrong use of oldVale and newValue
EDIT:
If the 'z' of the lockImage is multiple of 32 (just a hint for understanding, no needed exact multiple), you try to pack 32 voxel-locks in an integer. Let's call this integer 32C.
A shader instance ("SI") may want to change its bit in 32C, lock or unlock. So you must (A)get the current value and (B)change only your bit.
Other SIs are trying to change their bits. Some with the same bit, others with different bits.
Between two calls to imageAtomicCompSwap in the one SI, other SI may have changed not your bit (it's locked, no?) but other bits in the same 32C value. You don't know which is the current value, you know only your bit. Thus you have nothing (or an old wrong value) to compare with in the imageAtomicCompSwap call. It likely fails to set a new value. Several SIs failing leads to "deadlocks" and the while-loop never ends.
You try to avoid using an old wrong value by oldValue = result and trying again with imageAtomicCompSwap. This the (A)-(B) I wrote before. But between (A) and (B) still other SI may have changed the result= 32C value, ruining your idea.
IDEA:
You can use my simple approach (just 0 or 1 values in lockImage), without bits thing. The result is that lockImage is smaller. But all shader instances trying to update any of the 32 image coords related to a 32C value in lockImage will wait until the one who locked that value frees it.
Using another lockImage2 just to lock-unlock the 32C value for a bit update, seems too much spinning.

I have written article about how to implement per pixel mutex in fragment shader along with code . I think you can refer that. You are doing pretty similar thing that I have explained there. Here we go:
Getting Over Draw Count and Per Pixel Mutex
what is overdraw count ?
Mostly on embedded hardware the major concern for performance drop could be overdraw. Basically one pixel on screen is shaded multiple times by the GPU due to nature of geometry or scene we are drawing and this is called as overdraw. There are many tools to visualize overdraw count.
Details about overdraw?
When we draw some vertices those vertices will be transformed to clip space then to window coordinates. Rasterizer then maps this coordinates to pixels/fragments.Then for pixels/fragments GPU calls pixel shader. There could be cases when we are drawing multiple instance of geometry and blending them. So, this will do drawing on same pixel multiple times.This will lead to overdraw and could degrade performance.
Strategies to avoid overdraw?
Consider Frustum culling - Do frustum culling on CPU so that objects out of cameras field of view will not be rendered.
Sort objects based on z - Draw objects from front to back this way for later objects z test will fail and the fragment wont be written.
Enable back face culling - Using this we can avoid rendering back faces that are looking towards camera. 
If you observe point 2, we are rendering in exactly reverse order for blending.We are rendering from back to front. We need to do this because blending happens after z test. If for any fragment fails z test then though it is at back we should still consider it as blending is on but, that fragment will be completely ignored giving artifacts.Hence we need to maintain order from back to front. Due to this when blending is enabled we get more overdraw count.
Why we need Per Pixel Mutex?
By nature GPU is parallel so, shading of pixels can be done in parallel. So there are many instance of pixel shader running at a time. This instances may be shading same pixel and hence accessing same pixels.This may lead to some synchronization issues.This may create some unwanted effects. In this application I am maintaining overdraw count in image buffer initialized to 0. The operations I do are in following order.
Read i pixel's count from image buffer (which will be zero for first time)
Add 1 to value of counter read in step 1
Store new value of counter in ith position pixel in image buffer
As I told you multiple instance of pixel shader could be working on same pixel this may lead to corruption of counter variable.As these steps of algorithm are not atomic. I could have used inbuilt function imageAtomicAdd(). I wanted to show how we can implement per pixel mutex so, I have not used inbuilt function imageAtomicAdd().
#version 430
layout(binding = 0,r32ui) uniform uimage2D overdraw_count;
layout(binding = 1,r32ui) uniform uimage2D image_lock;
void mutex_lock(ivec2 pos) {
uint lock_available;
do {
lock_available = imageAtomicCompSwap(image_lock, pos, 0, 1);
} while (lock_available == 0);
}
void mutex_unlock(ivec2 pos) {
imageStore(image_lock, pos, uvec4(0));
}
out vec4 color;
void main() {
mutex_lock(ivec2(gl_FragCoord.xy));           
uint count = imageLoad(overdraw_count, ivec2(gl_FragCoord.xy)).x + 1;
imageStore(overdraw_count, ivec2(gl_FragCoord.xy), uvec4(count));
mutex_unlock(ivec2(gl_FragCoord.xy)); 
}
Fragment_Shader.fs
About Demo.
In demo video you can see we are rendering many teapots and blending is on.So pixels with more intensity shows there overdraw count is high.
on youtube
Note: On android you can see this overdraw count in debug GPU options.
source: Per Pixel Mutex

C# XNA: Optimizing Collision Detection?

I'm working on a simple demo for collision detection, which contains only a bunch of objects bouncing around in the window. (The goal is to see how many objects the game can handle at once without dropping frames.)
There is gravity, so the objects are either moving or else colliding with a wall.
The naive solution was O(n^2):
foreach Collidable c1:
foreach Collidable c2:
checkCollision(c1, c2);
This is pretty bad. So I set up CollisionCell objects, which maintain information about a portion of the screen. The idea is that each Collidable only needs to check for the other objects in its cell. With 60 px by 60 px cells, this yields almost a 10x improvement, but I'd like to push it further.
A profiler has revealed that the the code spends 50% of its time in the function each cell uses to get its contents. Here it is:
// all the objects in this cell
public ICollection<GameObject> Containing
{
get
{
ICollection<GameObject> containing = new HashSet<GameObject>();
foreach (GameObject obj in engine.GameObjects) {
// 20% of processor time spent in this conditional
if (obj.Position.X >= bounds.X &&
obj.Position.X < bounds.X + bounds.Width &&
obj.Position.Y >= bounds.Y &&
obj.Position.Y < bounds.Y + bounds.Height) {
containing.Add(obj);
}
}
return containing;
}
}
Of that 20% of the program's time is spent in that conditional.
Here is where the above function gets called:
// Get a list of lists of cell contents
List<List<GameObject>> cellContentsSet = cellManager.getCellContents();
// foreach item, only check items in the same cell
foreach (List<GameObject> cellMembers in cellContentsSet) {
foreach (GameObject item in cellMembers) {
// process collisions
}
}
//...
// Gets a list of list of cell contents (each sub list = 1 cell)
internal List<List<GameObject>> getCellContents() {
List<List<GameObject>> result = new List<List<GameObject>>();
foreach (CollisionCell cell in cellSet) {
result.Add(new List<GameObject>(cell.Containing.ToArray()));
}
return result;
}
Right now, I have to iterate over every cell - even empty ones. Perhaps this could be improved on somehow, but I'm not sure how to verify that a cell is empty without looking at it somehow. (Maybe I could implement something like sleeping objects, in some physics engines, where if an object will be still for a while it goes to sleep and is not included in calculations for every frame.)
What can I do to optimize this? (Also, I'm new to C# - are there any other glaring stylistic errors?)
When the game starts lagging out, the objects tend to be packed fairly tightly, so there's not that much motion going on. Perhaps I can take advantage of this somehow, writing a function to see if, given an object's current velocity, it can possibly leave its current cell before the next call to Update()
UPDATE 1 I decided to maintain a list of the objects that were found to be in the cell at last update, and check those first to see if they were still in the cell. Also, I maintained an area of the CollisionCell variable, when when the cell was filled I could stop looking. Here is my implementation of that, and it made the whole demo much slower:
// all the objects in this cell
private ICollection<GameObject> prevContaining;
private ICollection<GameObject> containing;
internal ICollection<GameObject> Containing {
get {
return containing;
}
}
/**
* To ensure that `containing` and `prevContaining` are up to date, this MUST be called once per Update() loop in which it is used.
* What is a good way to enforce this?
*/
public void updateContaining()
{
ICollection<GameObject> result = new HashSet<GameObject>();
uint area = checked((uint) bounds.Width * (uint) bounds.Height); // the area of this cell
// first, try to fill up this cell with objects that were in it previously
ICollection<GameObject>[] toSearch = new ICollection<GameObject>[] { prevContaining, engine.GameObjects };
foreach (ICollection<GameObject> potentiallyContained in toSearch) {
if (area > 0) { // redundant, but faster?
foreach (GameObject obj in potentiallyContained) {
if (obj.Position.X >= bounds.X &&
obj.Position.X < bounds.X + bounds.Width &&
obj.Position.Y >= bounds.Y &&
obj.Position.Y < bounds.Y + bounds.Height) {
result.Add(obj);
area -= checked((uint) Math.Pow(obj.Radius, 2)); // assuming objects are square
if (area <= 0) {
break;
}
}
}
}
}
prevContaining = containing;
containing = result;
}
UPDATE 2 I abandoned that last approach. Now I'm trying to maintain a pool of collidables (orphans), and remove objects from them when I find a cell that contains them:
internal List<List<GameObject>> getCellContents() {
List<GameObject> orphans = new List<GameObject>(engine.GameObjects);
List<List<GameObject>> result = new List<List<GameObject>>();
foreach (CollisionCell cell in cellSet) {
cell.updateContaining(ref orphans); // this call will alter orphans!
result.Add(new List<GameObject>(cell.Containing));
if (orphans.Count == 0) {
break;
}
}
return result;
}
// `orphans` is a list of GameObjects that do not yet have a cell
public void updateContaining(ref List<GameObject> orphans) {
ICollection<GameObject> result = new HashSet<GameObject>();
for (int i = 0; i < orphans.Count; i++) {
// 20% of processor time spent in this conditional
if (orphans[i].Position.X >= bounds.X &&
orphans[i].Position.X < bounds.X + bounds.Width &&
orphans[i].Position.Y >= bounds.Y &&
orphans[i].Position.Y < bounds.Y + bounds.Height) {
result.Add(orphans[i]);
orphans.RemoveAt(i);
}
}
containing = result;
}
This only yields a marginal improvement, not the 2x or 3x I'm looking for.
UPDATE 3 Again I abandoned the above approaches, and decided to let each object maintain its current cell:
private CollisionCell currCell;
internal CollisionCell CurrCell {
get {
return currCell;
}
set {
currCell = value;
}
}
This value gets updated:
// Run 1 cycle of this object
public virtual void Run()
{
position += velocity;
parent.CellManager.updateContainingCell(this);
}
CellManager code:
private IDictionary<Vector2, CollisionCell> cellCoords = new Dictionary<Vector2, CollisionCell>();
internal void updateContainingCell(GameObject gameObject) {
CollisionCell currCell = findContainingCell(gameObject);
gameObject.CurrCell = currCell;
if (currCell != null) {
currCell.Containing.Add(gameObject);
}
}
// null if no such cell exists
private CollisionCell findContainingCell(GameObject gameObject) {
if (gameObject.Position.X > GameEngine.GameWidth
|| gameObject.Position.X < 0
|| gameObject.Position.Y > GameEngine.GameHeight
|| gameObject.Position.Y < 0) {
return null;
}
// we'll need to be able to access these outside of the loops
uint minWidth = 0;
uint minHeight = 0;
for (minWidth = 0; minWidth + cellWidth < gameObject.Position.X; minWidth += cellWidth) ;
for (minHeight = 0; minHeight + cellHeight < gameObject.Position.Y; minHeight += cellHeight) ;
CollisionCell currCell = cellCoords[new Vector2(minWidth, minHeight)];
// Make sure `currCell` actually contains gameObject
Debug.Assert(gameObject.Position.X >= currCell.Bounds.X && gameObject.Position.X <= currCell.Bounds.Width + currCell.Bounds.X,
String.Format("{0} should be between lower bound {1} and upper bound {2}", gameObject.Position.X, currCell.Bounds.X, currCell.Bounds.X + currCell.Bounds.Width));
Debug.Assert(gameObject.Position.Y >= currCell.Bounds.Y && gameObject.Position.Y <= currCell.Bounds.Height + currCell.Bounds.Y,
String.Format("{0} should be between lower bound {1} and upper bound {2}", gameObject.Position.Y, currCell.Bounds.Y, currCell.Bounds.Y + currCell.Bounds.Height));
return currCell;
}
I thought this would make it better - now I only have to iterate over collidables, not all collidables * cells. Instead, the game is now hideously slow, delivering only 1/10th of its performance with my above approaches.
The profiler indicates that a different method is now the main hot spot, and the time to get neighbors for an object is trivially short. That method didn't change from before, so perhaps I'm calling it WAY more than I used to...

It spends 50% of its time in that function because you call that function a lot. Optimizing that one function will only yield incremental improvements to performance.
Alternatively, just call the function less!
You've already started down that path by setting up a spatial partitioning scheme (lookup Quadtrees to see a more advanced form of your technique).
A second approach is to break your N*N loop into an incremental form and to use a CPU budget.
You can allocate a CPU budget for each of the modules that want action during frame times (during Updates). Collision is one of these modules, AI might be another.
Let's say you want to run your game at 60 fps. This means you have about 1/60 s = 0.0167 s of CPU time to burn between frames. No we can split those 0.0167 s between our modules. Let's give collision 30% of the budget: 0.005 s.
Now your collision algorithm knows that it can only spend 0.005 s working. So if it runs out of time, it will need to postpone some tasks for later - you will make the algorithm incremental. Code for achieving this can be as simple as:
const double CollisionBudget = 0.005;
Collision[] _allPossibleCollisions;
int _lastCheckedCollision;
void HandleCollisions() {
var startTime = HighPerformanceCounter.Now;
if (_allPossibleCollisions == null ||
_lastCheckedCollision >= _allPossibleCollisions.Length) {
// Start a new series
_allPossibleCollisions = GenerateAllPossibleCollisions();
_lastCheckedCollision = 0;
}
for (var i=_lastCheckedCollision; i<_allPossibleCollisions.Length; i++) {
// Don't go over the budget
if (HighPerformanceCount.Now - startTime > CollisionBudget) {
break;
}
_lastCheckedCollision = i;
if (CheckCollision(_allPossibleCollisions[i])) {
HandleCollision(_allPossibleCollisions[i]);
}
}
}
There, now it doesn't matter how fast the collision code is, it will be done as quickly as is possible without affecting the user's perceived performance.
Benefits include:
The algorithm is designed to run out of time, it just resumes on the next frame, so you don't have to worry about this particular edge case.
CPU budgeting becomes more and more important as the number of advanced/time consuming algorithms increases. Think AI. So it's a good idea to implement such a system early on.
Human response time is less than 30 Hz, your frame loop is running at 60 Hz. That gives the algorithm 30 frames to complete its work, so it's OK that it doesn't finish its work.
Doing it this way gives stable, data-independent frame rates.
It still benefits from performance optimizations to the collision algorithm itself.
Collision algorithms are designed to track down the "sub frame" in which collisions happened. That is, you will never be so lucky as to catch a collision just as it happens - thinking you're doing so is lying to yourself.

I can help here; i wrote my own collision detection as an experiment. I think i can tell you right now that you won't get the performance you need without changing algorithms. Sure, the naive way is nice, but only works for so many items before collapsing. What you need is Sweep and prune. The basic idea goes like this (from my collision detection library project):
using System.Collections.Generic;
using AtomPhysics.Interfaces;
namespace AtomPhysics.Collisions
{
public class SweepAndPruneBroadPhase : IBroadPhaseCollider
{
private INarrowPhaseCollider _narrowPhase;
private AtomPhysicsSim _sim;
private List<Extent> _xAxisExtents = new List<Extent>();
private List<Extent> _yAxisExtents = new List<Extent>();
private Extent e1;
public SweepAndPruneBroadPhase(INarrowPhaseCollider narrowPhase)
{
_narrowPhase = narrowPhase;
}
public AtomPhysicsSim Sim
{
get { return _sim; }
set { _sim = null; }
}
public INarrowPhaseCollider NarrowPhase
{
get { return _narrowPhase; }
set { _narrowPhase = value; }
}
public bool NeedsNotification { get { return true; } }
public void Add(Nucleus nucleus)
{
Extent xStartExtent = new Extent(nucleus, ExtentType.Start);
Extent xEndExtent = new Extent(nucleus, ExtentType.End);
_xAxisExtents.Add(xStartExtent);
_xAxisExtents.Add(xEndExtent);
Extent yStartExtent = new Extent(nucleus, ExtentType.Start);
Extent yEndExtent = new Extent(nucleus, ExtentType.End);
_yAxisExtents.Add(yStartExtent);
_yAxisExtents.Add(yEndExtent);
}
public void Remove(Nucleus nucleus)
{
foreach (Extent e in _xAxisExtents)
{
if (e.Nucleus == nucleus)
{
_xAxisExtents.Remove(e);
}
}
foreach (Extent e in _yAxisExtents)
{
if (e.Nucleus == nucleus)
{
_yAxisExtents.Remove(e);
}
}
}
public void Update()
{
_xAxisExtents.InsertionSort(comparisonMethodX);
_yAxisExtents.InsertionSort(comparisonMethodY);
for (int i = 0; i < _xAxisExtents.Count; i++)
{
e1 = _xAxisExtents[i];
if (e1.Type == ExtentType.Start)
{
HashSet<Extent> potentialCollisionsX = new HashSet<Extent>();
for (int j = i + 1; j < _xAxisExtents.Count && _xAxisExtents[j].Nucleus.ID != e1.Nucleus.ID; j++)
{
potentialCollisionsX.Add(_xAxisExtents[j]);
}
HashSet<Extent> potentialCollisionsY = new HashSet<Extent>();
for (int j = i + 1; j < _yAxisExtents.Count && _yAxisExtents[j].Nucleus.ID != e1.Nucleus.ID; j++)
{
potentialCollisionsY.Add(_yAxisExtents[j]);
}
List<Extent> probableCollisions = new List<Extent>();
foreach (Extent e in potentialCollisionsX)
{
if (potentialCollisionsY.Contains(e) && !probableCollisions.Contains(e) && e.Nucleus.ID != e1.Nucleus.ID)
{
probableCollisions.Add(e);
}
}
foreach (Extent e2 in probableCollisions)
{
if (e1.Nucleus.DNCList.Contains(e2.Nucleus) || e2.Nucleus.DNCList.Contains(e1.Nucleus))
continue;
NarrowPhase.DoCollision(e1.Nucleus, e2.Nucleus);
}
}
}
}
private bool comparisonMethodX(Extent e1, Extent e2)
{
float e1PositionX = e1.Nucleus.NonLinearSpace != null ? e1.Nucleus.NonLinearPosition.X : e1.Nucleus.Position.X;
float e2PositionX = e2.Nucleus.NonLinearSpace != null ? e2.Nucleus.NonLinearPosition.X : e2.Nucleus.Position.X;
e1PositionX += (e1.Type == ExtentType.Start) ? -e1.Nucleus.Radius : e1.Nucleus.Radius;
e2PositionX += (e2.Type == ExtentType.Start) ? -e2.Nucleus.Radius : e2.Nucleus.Radius;
return e1PositionX < e2PositionX;
}
private bool comparisonMethodY(Extent e1, Extent e2)
{
float e1PositionY = e1.Nucleus.NonLinearSpace != null ? e1.Nucleus.NonLinearPosition.Y : e1.Nucleus.Position.Y;
float e2PositionY = e2.Nucleus.NonLinearSpace != null ? e2.Nucleus.NonLinearPosition.Y : e2.Nucleus.Position.Y;
e1PositionY += (e1.Type == ExtentType.Start) ? -e1.Nucleus.Radius : e1.Nucleus.Radius;
e2PositionY += (e2.Type == ExtentType.Start) ? -e2.Nucleus.Radius : e2.Nucleus.Radius;
return e1PositionY < e2PositionY;
}
private enum ExtentType { Start, End }
private sealed class Extent
{
private ExtentType _type;
public ExtentType Type
{
get
{
return _type;
}
set
{
_type = value;
_hashcode = 23;
_hashcode *= 17 + Nucleus.GetHashCode();
}
}
private Nucleus _nucleus;
public Nucleus Nucleus
{
get
{
return _nucleus;
}
set
{
_nucleus = value;
_hashcode = 23;
_hashcode *= 17 + Nucleus.GetHashCode();
}
}
private int _hashcode;
public Extent(Nucleus nucleus, ExtentType type)
{
Nucleus = nucleus;
Type = type;
_hashcode = 23;
_hashcode *= 17 + Nucleus.GetHashCode();
}
public override bool Equals(object obj)
{
return Equals(obj as Extent);
}
public bool Equals(Extent extent)
{
if (this.Nucleus == extent.Nucleus)
{
return true;
}
return false;
}
public override int GetHashCode()
{
return _hashcode;
}
}
}
}
and here's the code that does the insertion sort (more-or-less a direct translation of the pseudocode here):
/// <summary>
/// Performs an insertion sort on the list.
/// </summary>
/// <typeparam name="T">The type of the list supplied.</typeparam>
/// <param name="list">the list to sort.</param>
/// <param name="comparison">the method for comparison of two elements.</param>
/// <returns></returns>
public static void InsertionSort<T>(this IList<T> list, Func<T, T, bool> comparison)
{
for (int i = 2; i < list.Count; i++)
{
for (int j = i; j > 1 && comparison(list[j], list[j - 1]); j--)
{
T tempItem = list[j];
list.RemoveAt(j);
list.Insert(j - 1, tempItem);
}
}
}
IIRC, i was able to get an extremely large performance increase with that, especially when dealing with large numbers of colliding bodies. You'll need to adapt it for your code, but that's the basic premise behind sweep and prune.
The other thing i want to remind you is that you should use a profiler, like the one made by Red Gate. There's a free trial which should last you long enough.

It looks like you are looping through all the game objects just to see what objects are contained in a cell. It seems like a better approach would be to store the list of game objects that are in a cell for each cell. If you do that and each object knows what cells it is in, then moving objects between cells should be easy. This seems like it will yield the biggest performance gain.
Here is another optimization tip for determing what cells an object is in:
If you have already determined what cell(s) an object is in and know that based on the objects velocity it will not change cells for the current frame, there is no need to rerun the logic that determines what cells the object is in. You can do a quick check by creating a bounding box that contains all the cells that the object is in. You can then create a bounding box that is the size of the object + the velocity of the object for the current frame. If the cell bounding box contains the object + velocity bounding box, no further checks need to be done. If the object isn't moving, it's even easier and you can just use the object bounding box.
Let me know if that makes sense, or google / bing search for "Quad Tree", or if you don't mind using open source code, check out this awesome physics library: http://www.codeplex.com/FarseerPhysics

I'm in the exact same boat as you. I'm trying to create an overhead shooter and need to push efficiency to the max so I can have tons of bullets and enemies on screen at once.
I'd get all of my collidable objects in an array with a numbered index. This affords the opportunity to take advantage of an observation: if you iterate over the list fully for each item you'll be duplicating efforts. That is (and note, I'm making up variables names just to make it easier to spit out some pseudo-code)
if (objs[49].Intersects(objs[51]))
is equivalent to:
if (objs[51].Intersects(objs[49]))
So if you use a numbered index you can save some time by not duplicating efforts. Do this instead:
for (int i1 = 0; i1 < collidables.Count; i1++)
{
//By setting i2 = i1 + 1 you ensure an obj isn't checking collision with itself, and that objects already checked against i1 aren't checked again. For instance, collidables[4] doesn't need to check against collidables[0] again since this was checked earlier.
for (int i2 = i1 + 1; i2 < collidables.Count; i2++)
{
//Check collisions here
}
}
Also, I'd have each cell either have a count or a flag to determine if you even need to check for collisions. If a certain flag is set, or if the count is less than 2, than no need to check for collisions.

Just a heads up: Some people suggest farseer; which is a great 2D physics library for use with XNA. If you're in the market for a 3D physics engine for XNA, I've used bulletx (a c# port of bullet) in XNA projects to great effect.
Note: I have no affiliation to the bullet or bulletx projects.

An idea might be to use a bounding circle. Basically, when a Collidable is created, keep track of it's centre point and calculate a radius/diameter that contains the whole object. You can then do a first pass elimination using something like;
int r = C1.BoundingRadius + C2.BoundingRadius;
if( Math.Abs(C1.X - C2.X) > r && Math.Abs(C1.Y - C2.Y) > r )
/// Skip further checks...
This drops the comparisons to two for most objects, but how much this will gain you I'm not sure...profile!

There are a couple of things that could be done to speed up the process... but as far as I can see your method of checking for simple rectangular collision is just fine.
But I'd replace the check
if (obj.Position.X ....)
With
if (obj.Bounds.IntersercsWith(this.Bounds))
And I'd also replace the line
result.Add(new List<GameObject>(cell.Containing.ToArray()));
For
result.Add(new List<GameObject>(cell.Containing));
As the Containing property returns an ICollection<T> and that inherits the IEnumerable<T> that is accepted by the List<T> constructor.
And the method ToArray() simply iterates to the list returning an array, and this process is done again when creating the new list.

I know this Thread is old but i would say that the marked answar was completly wrong...
his code contain a fatal error and don´t give performance improvent´s it will take performence!
At first a little notic...
His code is created so that you have to call this code in your Draw methode but this is the wrong place for collision-detection. In your draw methode you should only draw nothing else!
But you can´t call HandleCollisions() in Update, because Update get a lots of more calls than Draw´s.
If you want call HandleCollisions() your code have to look like this... This code will prevent that your collision detection run more then once per frame.
private bool check = false;
protected override Update(GameTime gameTime)
{
if(!check)
{
check = true;
HandleCollisions();
}
}
protected override Draw(GameTime gameTime)
{
check = false;
}
Now let us take a look what´s wrong with HandleCollisions().
Example: We have 500 objects and we would do a check for every possible Collision without optimizing our detection.
With 500 object we should have 249500 collision checks (499X500 because we don´t want to check if an object collide with it´s self)
But with Frank´s code below we will lose 99.998% of your collosions (only 500 collision-checks will done). << THIS WILL INCREASE THE PERFORMENCES!
Why? Because _lastCheckedCollision will never be the same or greater then allPossibleCollisions.Length... and because of that you would only check the last index 499
for (var i=_lastCheckedCollision; i<_allPossibleCollisions.Length; i++)
_lastCheckedCollision = i;
//<< This could not be the same as _allPossibleCollisions.Length,
//because i have to be lower as _allPossibleCollisions.Length
you have to replace This
if (_allPossibleCollisions == null ||
_lastCheckedCollision >= _allPossibleCollisions.Length)
with this
if (_allPossibleCollisions == null ||
_lastCheckedCollision >= _allPossibleCollisions.Length - 1) {
so your whole code can be replaced by this.
private bool check = false;
protected override Update(GameTime gameTime)
{
if(!check)
{
check = true;
_allPossibleCollisions = GenerateAllPossibleCollisions();
for(int i=0; i < _allPossibleCollisions.Length; i++)
{
if (CheckCollision(_allPossibleCollisions[i]))
{
//Collision!
}
}
}
}
protected override Draw(GameTime gameTime)
{
check = false;
}
... this should be a lot of faster than your code ... and it works :D ...
RCIX answer should marked as correct because Frank´s answar is wrong.

C# Micro-Optimization Query: IEnumerable Replacement

Note: I'm optimizing because of past experience and due to profiler software's advice. I realize an alternative optimization would be to call GetNeighbors less often, but that is a secondary issue at the moment.
I have a very simple function described below. In general, I call it within a foreach loop. I call that function a lot (about 100,000 times per second). A while back, I coded a variation of this program in Java and was so disgusted by the speed that I ended up replacing several of the for loops which used it with 4 if statements. Loop unrolling seems ugly, but it did make a noticeable difference in application speed. So, I've come up with a few potential optimizations and thought I would ask for opinions on their merit and for suggestions:
Use four if statements and totally ignore the DRY principle. I am confident this will improve performance based on past experience, but it makes me sad. To clarify, the 4 if statements would be pasted anywhere I called getNeighbors() too frequently and would then have the inside of the foreach block pasted within them.
Memoize the results in some mysterious manner.
Add a "neighbors" property to all squares. Generate its contents at initialization.
Use a code generation utility to turn calls to GetNeighbors into if statements as part of compilation.
public static IEnumerable<Square> GetNeighbors(Model m, Square s)
{
int x = s.X;
int y = s.Y;
if (x > 0) yield return m[x - 1, y];
if (y > 0) yield return m[x, y - 1];
if (x < m.Width - 1) yield return m[x + 1, y];
if (y < m.Height - 1) yield return m[x, y + 1];
yield break;
}
//The property of Model used to get elements.
private Square[,] grid;
//...
public Square this[int x, int y]
{
get
{
return grid[x, y];
}
}
Note: 20% of the time spent by the GetNeighbors function is spent on the call to m.get_Item, the other 80% is spent in the method itself.

Brian,
I've run into similar things in my code.
The two things I've found with C# that helped me the most:
First, don't be afraid necessarily of allocations. C# memory allocations are very, very fast, so allocating an array on the fly can often be faster than making an enumerator. However, whether this will help depends a lot on how you're using the results. The only pitfall I see is that, if you return a fixed size array (4), you're going to have to check for edge cases in the routine that's using your results.
Depending on how large your matrix of Squares is in your model, you may be better off doing 1 check up front to see if you're on the edge, and if not, precomputing the full array and returning it. If you're on an edge, you can handle those special cases separately (make a 1 or 2 element array as appropriate). This would put one larger statement in there, but that is often faster in my experience. If the model is large, I would avoid precomputing all of the neighbors. The overhead in the Squares may outweigh the benefits.
In my experience, as well, preallocating and returning vs. using yield makes the JIT more likely to inline your function, which can make a big difference in speed. If you can take advantage of the IEnumerable results and you are not always using every returned element, that is better, but otherwise, precomputing may be faster.
The other thing to consider - I don't know what information is saved in Square in your case, but if hte object is relatively small, and being used in a large matrix and iterated over many, many times, consider making it a struct. I had a routine similar to this (called hundreds of thousands or millions of times in a loop), and changing the class to a struct, in my case, sped up the routine by over 40%. This is assuming you're using .net 3.5sp1, though, as the JIT does many more optimizations on structs in the latest release.
There are other potential pitfalls to switching to struct vs. class, of course, but it can have huge performance impacts.

I'd suggest making an array of Squares (capacity four) and returning that instead. I would be very suspicious about using iterators in a performance-sensitive context. For example:
// could return IEnumerable<Square> still instead if you preferred.
public static Square[] GetNeighbors(Model m, Square s)
{
int x = s.X, y = s.Y, i = 0;
var result = new Square[4];
if (x > 0) result[i++] = m[x - 1, y];
if (y > 0) result[i++] = m[x, y - 1];
if (x < m.Width - 1) result[i++] = m[x + 1, y];
if (y < m.Height - 1) result[i++] = m[x, y + 1];
return result;
}
I wouldn't be surprised if that's much faster.

I'm on a slippery slope, so insert disclaimer here.
I'd go with option 3. Fill in the neighbor references lazily and you've got a kind of memoization.
ANother kind of memoization would be to return an array instead of a lazy IEnumerable, and GetNeighbors becomes a pure function that is trivial to memoize. This amounts roughly to option 3 though.
In any case, but you know this, profile and re-evaluate every step of the way. I am for example unsure about the tradeoff between the lazy IEnumerable or returning an array of results directly. (you avoid some indirections but need an allocation).

Why not make the Square class responsible of returning it's neighbours? Then you have an excellent place to do lazy initialisation without the extra overhead of memoization.
public class Square {
private Model _model;
private int _x;
private int _y;
private Square[] _neightbours;
public Square(Model model, int x, int y) {
_model = model;
_x = x;
_y = y;
_neightbours = null;
}
public Square[] Neighbours {
get {
if (_neightbours == null) {
_neighbours = GetNeighbours();
}
return _neighbours;
}
}
private Square[] GetNeightbours() {
int len = 4;
if (_x == 0) len--;
if (_x == _model.Width - 1) len--;
if (_y == 0) len--;
if (-y == _model.Height -1) len--;
Square [] result = new Square(len);
int i = 0;
if (_x > 0) {
result[i++] = _model[_x - 1,_y];
}
if (_x < _model.Width - 1) {
result[i++] = _model[_x + 1,_y];
}
if (_y > 0) {
result[i++] = _model[_x,_y - 1];
}
if (_y < _model.Height - 1) {
result[i++] = _model[_x,_y + 1];
}
return result;
}
}

Depending on the use of GetNeighbors, maybe some inversion of control could help:
public static void DoOnNeighbors(Model m, Square s, Action<s> action) {
int x = s.X;
int y = s.Y;
if (x > 0) action(m[x - 1, y]);
if (y > 0) action(m[x, y - 1]);
if (x < m.Width - 1) action(m[x + 1, y]);
if (y < m.Height - 1) action(m[x, y + 1]);
}
But I'm not sure, if this has better performance.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

NegaScout with Zobrist Transposition Tables in Chess - c#

Related

Principal variation in chess engine - How could I rewrite this C code into C#?

Getting N x N dimension data from quad tree is very slow in c#

GLSL Spinlock Blocking Forever

C# XNA: Optimizing Collision Detection?

C# Micro-Optimization Query: IEnumerable Replacement

Categories

Resources