I have a list of entities which implement an ICollidable interface. This interface is used to resolve collisions between entities. My entities are thus:
Players
Enemies
Projectiles
Items
Tiles
On each game update (about 60 t/s), I am clearing the list and adding the current entities based on the game state. I am accomplishing this via:
collidableEntities.Clear();
collidableEntities.AddRange(players);
collidableEntities.AddRange(enemies);
collidableEntities.AddRange(projectiles);
collidableEntities.AddRange(items);
collidableEntities.AddRange(camera.VisibleTiles);
Everything works fine until I add the visible tiles to the list. The first ~1-2 seconds of running the game loop causes a visible hiccup that delays drawing (so I can see a jitter in the rendering). I can literally remove/add the line that adds the tiles and see the jitter occur and not occur, so I have narrowed it down to that line.
My question is, why? The list of VisibleTiles is about 450-500 tiles, so it's really not that much data. Each tile contains a Texture2D (image) and a Vector2 (position) to determine what is rendered and where. I'm going to keep looking, but from the top of my head, I can't understand why only the first 1-2 seconds hiccups but is then smooth from there on out.
Any advice is appreciated.
Update
I have tried increasing the initial capacity to an approximate amount of elements, but no difference was observed.
Update
As requested, here is the code for camera.VisibleTiles
public List<Tile>
{
get { return this.visibleTiles; }
}
Whenever you have a performance issue like this, the correct response is not to guess at what possible causes might be; the correct response is to profile the application. There are a number of options available; I'm sure you can find suggestions here on StackOverflow.
However, I'm feeling lucky today, so I'm going to ignore the advice I just gave you and take a wild guess. I will explain my reasoning, which I hope will be helpful in diagnosing such problems in the future.
The fact that the problem you describe only happens once and only at the beginning of gameplay leads me to believe that it's not anything inherent to the functionality of the list itself, or of the collision logic. This is assuming, of course, that the number of objects in the list is basically constant across this period of time. If it was caused by either of these, I would expect it to happen every frame.
Therefore I suspect the garbage collector. You're likely to see GC happening right at the beginning of the game-in other words, right after you've loaded all of your massive assets into memory. You're also likely to see it happen at seemingly random points in the code, because any object you allocate can theoretically push it over the edge into a collection.
My guess is this: when you load your game, the assets you create are generating a large amount of collection pressure which is nonetheless not sufficient to trigger a collection. As you allocate objects during the course of gameplay (in this case, as a result of resizing the list), it is increasing the collection pressure to the point where the GC finally decides to activate, causing the jitter you observe. Once the GC has run, and all of the appropriate objects have been relegated to their correct generations, the jitter stops.
As I said, this is just a guess, albeit an educated one. It is, however, simple to test. Add a call to GC.Collect(2) prior to entering your render loop. If I'm right, the jitter should go away.
If you want to be more thorough than that-and I highly recommend it-Microsoft provides a tool which is useful for debugging memory issues, the CLR Profiler. This tool will show you exactly when collections are occurring and why. It's very useful for XNA development.
1-2 seconds may be caused by GC adjusting sizes of each generation. Look at corresponding perf counters and see if there is large number of Gen1/2 collections in first seconds (minimizing GC is useful goal for games)
Additional random guess: all your objects are struct for whatever reason. And very large. So copying them takes long time. (would not explain smoothing out after first seconds)
Notes:
instead of merging items in single list consider creating iterator that will avoid all copying.
get profiler
learn to look at perf counters for your process. Interesting categories are memory, GC, CPU, handlers.
You can use an overloaded constructor for List<T> to initialize the list with a predefined size:
var collidableEntities = new List<object>(500);
Related
When I say list, I mean List, array, HashTable, things like that where you can iterate through with an IndexOf method.
I have a program that wants to make many repeated lookups to its lists, which is obviously slow if you have a list format that iterates through every value starting at 0 every time you do an IndexOf (and to a lesser extent slower than a direct reference when using hashtables with many many lookups). Before I do this, I want to ask;
Is there an existing IEnumerable that is optimized for repeated lookups?
Assuming this hypothetical IEnumerable keeps a reference of the lookup, (giving a key to the object doing the lookup to be stored), is this even a good idea in terms of locking in items that the garbage collector could pick up? Is there a way to mark fields as not "not important to keep in memory despite a reference existing" and "garbage-collectible"? (This isn't a problem I don't know how to fix as much as it would be something I would rather implement rather than working around)
If the answer to either of these is "no", then I'll make an IList that keeps a reference of the last index of a lookup, which wouldn't bog up the garbage collector.
The program I'm working on is a UI system, where Widgets tend to take up a decent amount of memory as well as changing a lot, leaving a garbage collector as my only means freeing up memory. Willy nilly references laying around are dangerous to account for.
Sorry for my last question, my code was so stupid.
My base situation is: I want to construct a state tree which has 8! items in the last state. so the total count of iterations is about 100.000 (8!*2 + 7! + 6! + ... )
it currently takes less than one second, i need to construct it every time my artificial intelligence is making a move. Of course, the alpha/beta search is a solution but before thinking of that i want to optimize my code so i really have the best possible performance.
what i already did:
tried to replace every LINQ function with precalculations or collections with faster access (Dictionary), more precalculations for skipping whole operations, of course, some approximations to spare heavy calculations, using List constructors only when there's actually a change, if not, just use the reference.
there'll be more calculations coming so i really need more ideas for reducing. maybe something about what collection is fastest for my purpose.
My code
It's about the BuildChildNodes function and the called TryCollect function. My Constructor is doing some little precalculations. my state tree knows everything, even the cards which aren't actually shown.
as the comment came up: i'm not asking you to read and understand my code to provide content-wise advices. i'm asking you about the functions, operators, data types and classes i'm using and if there could be make a replacement which runs a bit faster. e.g. if there's a faster collection for my purpose or if you have a better idea to replace the collections constructor with a faster method reagarding of adding and removing afterwards.
Edit: okay List is definitely the best type i can use. i tried [] Arrays and even Dictionaries () and last of all i even tried LinkedLists. All with a significant loss.
I can see that RemoveAt() could be expensive at it is proportional to the size of the list.
You can always use the Visual Studio performance profiler to find out where you should optimize your code the most.
If you can find a way to use fixed-size arrays that you allocate when your program starts, instead of dynamically-allocated data structures like List, you will save of lot on memory allocation management overhead.
We have a problem which seems to be caused by the constant allocation and deallocation of memory:
We have a rather complex system here, where a USB device is measuring arbitrary points and sending the measurement data to the PC at a rate of 50k samples per second. These samples are then collected as MeasurementTasks in the software for each point and afterwards processed which causes even more needed memory because of the requirements of the calculations.
Simplified each MeasurementTask looks like the following:
public class MeasurementTask
{
public LinkedList<Sample> Samples { get; set; }
public ComplexSample[] ComplexSamples { get; set; }
public Complex Result { get; set; }
}
Where Sample looks like:
public class Sample
{
public ushort CommandIndex;
public double ValueChannel1;
public double ValueChannel2;
}
and ComplexSample like:
public class ComplexSample
{
public double Channel1Real;
public double Channel1Imag;
public double Channel2Real;
public double Channel2Imag;
}
In the calculation process the Samples are first calculated into a ComplexSample each and then futher processed until we get our Complex Result. After these calculations are done we release all the Sample and ComplexSample instances and the GC cleans them up soon after, but this results in a constant "up and down" of the memory usage.
This is how it looks at the moment with each MeasurementTask containing ~300k samples:
Now we have sometimes the problem that the samples buffer in our HW device is overflown, as it can only store ~5000 samples (~100ms) and it seems the application is not always reading the device fast enough (we use BULK transfer with LibUSB/LibUSBDotNet). We tracked this problem down to this "memory up and down" by the following facts:
the reading from the USB device happens in its own thread which runs at ThreadPriority.Highest, so the calculations should not interfere
CPU usage is between 1-5% on my 8-core CPU => <50% of one core
if we have (much) faster MeasurementTasks with only a few hundret samples each, the memory goes only up and down very little and the buffer never overflows (but the amount of instances/second is the same, as the device still sends 50k samples/second)
we had a bug before, which did not release the Sample and ComplexSample instances after the calculations and so the memory only went up at ~2-3 MB/s and the buffer overflew all the time
At the moment (after fixing the bug mentioned above) we have a direct correlation between the samples count per point and the overflows. More samples/point = higher memory delta = more overflows.
Now to the actual question:
Can this behaviour be improved (easily)?
Maybe there is a way to tell the GC/runtime to not release the memory so there is no need to re-allocate?
We also thought of an alternative approach by "re-using" the LinkedList<Sample> and ComplexSample[]: Keep a pool of such lists/arrays and instead of releasing them put them back in the pool and "change" these instances instead of creating new ones, but we are not sure this is a good idea as it adds complexity to the whole system...
But we are open to other suggestions!
UPDATE:
I now optimized the code base with the following improvements and did various test runs:
converted Sample to a struct
got rid of the LinkedList<Sample> and replaced them by straigt arrays (I actually had another one somewhere else I also removed)
several minor optimizations I found during analysis and optimization
(optional - see below) converted ComplexSample to a struct
In any case it seems that the problem is gone now on my machine (long term tests and test on low-spec hardware will follow), but I first run a test with both types as struct and got the following memory usage graph:
There it still was going up to ~300 MB on a regular basis (but no overflow errors anymore), but as this still seemed odd to me I did some additional tests:
Side note: Each value of each ComplexSample is altered at least once during the calculations.
1) Add a GC.Collect after a task is processed and the samples are not referenced any more:
Now it was alternating between 140 MB and 150 MB (no noticable perfomance hit).
2) ComplexSample as a class (no GC.Collect):
Using a class it is much more "stable" at ~140-200 MB.
3) ComplexSample as a class and GC.Collect:
Now it is going "up and down" a little in the range of 135-150 MB.
Current solution:
As we are not sure this is a valid case for manually calling GC.Collect we are using "solution 2)" now and I will start running the long-term (= several hours) and low-spec hardware tests...
Can this behaviour be improved (easily)?
Yes (depends on how much you need to improve it).
The first thing I would do is to change Sample and ComplexSample to be value-types. This will reduce the complexity of the graph dealt with by GC as while the arrays and linked lists are still collected, they contain those values directly rather than references to them, and that simplifies the rest of GC.
Then I'd measure performance at this point. The impact of working with relatively large structs is mixed. The guideline that value types should be less than 16 bytes comes from it being around that point where the performance benefits of using a reference type tend to overwhelm the performance benefits of using a value type, but that guideline is only a guideline because "tend to overwhelm" is not the same as "will overwhelm in your application".
After that if it had either not improved things, or not improved things enough, I would consider using a pool of objects; whether for those smaller objects, only the larger objects, or both. This will most certainly increase the complexity of your application, but if it's time-critical, then it might well help. (See How do these people avoid creating any garbage? for example which discusses avoiding normal GC in a time-critical case).
If you know you'll need a fixed maximum of a given type this isn't too hard; create and fill an array of them and dole them out from that array before returning them as they are no longer used. It's still hard enough in that you no longer have GC being automatic and have to manually "delete" the objects by putting them back in the pool.
If you don't have such knowledge, it gets harder but is still possible.
If it is really vital that you avoid GC, be careful of hidden objects. Adding to most collection types can for example result in them moving up to a larger internal store, and leaving the earlier store to be collected. Maybe this is fine in that you've still reduced GC use enough that it is no longer causing the problem you have, but maybe not.
Rarely I've seen a LinkedList<> used in .NET... Have you tried using a List<>? Consider that the basic "element" of a LinkedList<> is a LinkedListNode<> that is a class... So for each Sample there is a whole additional overhead of one object.
Note that if you want to use "big" value types (as suggested by others), the List<> could become again slower (because the List<> grows by "generate a new-internal array of double the current size size and copy from old to new), so the bigger the elements, the more memory the List<> has to copy around when it doubles itself.
If you go to List<> you could try splitting the Sample to
List<ushort> CommandIndex;
List<Sample> ValueChannels;
This because the doubles of Sample require 8 byte alignment, so as written the Sample is 24 bytes, with only 18 bytes used.
This wouldn't be a good idea for LinkedList<>, because the LL has a big overhead per item.
Change Sample and ComplexSample to struct.
Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?
Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries
Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).
The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.
I found a memory leak in my XNA 4.0 application written in C#. The program needs to run for a long time (days) but it runs out of memory over the course of several hours and crashes. Opening Task Manager and watching the memory footprint, every second another 20-30 KB of memory is allocated to my program until it runs out. I believe the memory leak occurs when I set the BasicEffect.Texture property because that is the statement that finally throws the OutOfMemory exception.
The program has around 300 large (512px) textures stored in memory as Texture2D objects. The textures are not square or even powers of 2 - e.g. can be 512x431 - one side is always 512px. These objects are created only at initialization, so I am fairly confident it is not caused by creating/destroying Texture2D objects dynamically. Some interface elements create their own textures, but only ever in a constructor, and these interface elements are never removed from the program.
I am rendering texture mapped triangles. Before each object is rendered with triangles, I set the BasicEffect.Texture property to the already created Texture2D object and the BasicEffect.TextureEnabled property to true. I apply the BasicEffect in between each of these calls with BasicEffect.CurrentTechnique.Passes[0].Apply() - I'm aware that I'm calling Apply() twice as much as I should, but the code is wrapped inside of a helper class that calls Apply() whenever any property of BasicEffect changes.
I am using a single BasicEffect class for the entire application and I change its properties and call Apply() any time I render an object.
First, could it be that changing the BasicEffect.Texture property and calling Apply() so many times is leaking memory?
Second, is this the proper way to render triangles with different textures? E.g. using a single BasicEffect and updating its properties?
This code is taken from a helper class so I've removed all the fluff and only included the pertinent XNA calls:
//single BasicEffect object for entire application
BasicEffect effect = new BasicEffect(graphicsDevice);
// loaded from file at initialization (before any Draw() is called)
Texture2D texture1 = new Texture2D("image1.jpg");
Texture2D texture2 = new Texture2D("image2.jpg");
// render object 1
if(effect.Texture != texture1) // effect.Texture eventually throws OutOfMemory exception
effect.Texture = texture1;
effect.CurrentTechnique.Passes[0].Apply();
effect.TextureEnabled = true;
effect.CurrentTechnique.Passes[0].Apply();
graphicsDevice.DrawUserIndexedPrimitives(PrimitiveType.TriangleList, vertices1, 0, numVertices1, indices1, 0, numTriangles1);
// render object 2
if(effect.Texture != texture2)
effect.Texture = texture2;
effect.CurrentTechnique.Passes[0].Apply();
effect.TextureEnabled = true;
effect.CurrentTechnique.Passes[0].Apply();
graphicsDevice.DrawUserIndexedPrimitives(PrimitiveType.TriangleList, vertices2, 0, numVertices2, indices2, 0, numTriangles2);
It's an XNA application, so 60 times per second I call my Draw method, which renders all my various interface elements. This means that I could be drawing between 100-200 textures per frame, and the more textures I draw, the faster I run out of memory, even though I am not calling new anywhere in the update/draw loops. I am more experienced with OpenGL than DirectX, so clearly something is happening behind the scenes that is creating unmanaged memory I'm not aware of.
My only suggestion to you that I can come up with, is to group your textures into atlases rather than one by one. It will speed up your rendering time and reduce the load on the GPU, provided that you are rendering a massive amount of textures per frame. The reason for this, is because the GPU will not have to swap textures to render so often which is an expensive operation. I am using my knowledge of OpenGL, but my guess is that XNA is based on DirectX and I am assuming they load textures in a similar manner (unless you are using Monogame which lets you use OpenGL)
That said, you did not provide very much information. The memory leak can be coming from the texture switching, but it can also be coming from somewhere else. Most of your memory is occupied with textures, which is probably why you are getting a crash there rather than somewhere else. I am guessing a few things are happening here:
The garbage collector is not working fast enough to pick up all the RAM being allocated inside the rendering functions
You have a memory leak somewhere else in your code, and it is showing up here instead
Once again, it is difficult to figure out what is on here provided how little I know about your code. But to my best ability, I have a few suggestions for you:
Run through your code and look at how you are referecing things. Make sure that you don't have any temporary references inside of classes and structs. If you use something and pass it around to difference classes and consider it later as "discarded", there is good chance that someone is still holding onto that object, preventing it from being deleted
Do a search for all the "new" keywords you have in the solution. If you have something that constantly gets using the "new" keyword, this can be a huge memory leak as it's creating a massive ammount of objects in the heap. The garbage collector SHOULD be picking them up, but I wouldn't say I trust the garbage collector very much. Worst case that garbage collector is not coming around often enough to deal with this memory leak.
Find ways to reduce the sizes of textures. Atlasing is a solution that will reduce the overhead of packaging each texture in it's own Texture2D. This can be a bit more work because you will have to work in a system of swapping textures from the same file, but in your case that very well maybe worth it.
If you are convinced that there is an issue within XNA, try a different implementation called Monogame. It follows the exact same structure as XNA but is maintained by a community. As a result, the guts of the libraries you are using have been rewritten and there is a good chance that whatever is destroying your heap has been fixed.
My advice to you? If you are really familiar with OpenGL and what you are doing is fairly simple, then I would check out OpenTK. it is a thin linker layer that takes OpenGL and "ports" it into C#. All the commands are the exact same and you have the flexibility of using the entire .NET library for all the extra fluff.
I hope this helps!