I need a fast collection that maps 2D int-typed point to custom class in C#.
The collection needs to have:
Fast lookup (coords to custom class), adding a point if it does not exist
Fast remove range of key-point (outside of given rect). This actually rules out Dictionary<Point2D, ...>, as profiling found out this op is taking 35% of entire frame time in my sample implementation :-(
EDIT: To stress out: I want to remove all fields OUTSIDE of given rect (kill unused cache)
The coordinates can take any int-values (they are used to cache [almost] infinite isometric 2D map tiles that are near camera in Unity).
The points will be always organized in rect-like structure (I can relax this requirement to always follow rect, actually I am using isometric projection).
The structure itself is used for caching tile-specific data (like tile-transitions)
EDIT: Updated with outcome of discussion
You can use a sparse, static matrix for each "Chunk" in the cache and a cursor to represent the current viewport. You can then either use modulus math or a Quad tree to access each chunk, depending on the specific use case.
Old Answer:
If they are uniformly spaced, they why do you need to hash at all? You could just use a matrix of objects with NULL where is the default value if nothing is cached there.
Since you are using objects, the array is actually just references under the hood, the memory footprint of the array wouldn't really be affected by the null values.
If you truly need it to be infinite, you nest the matrices with a Quad Tree and create some kind of "Chunk" system.
I think this is what you need: RTree
Related
Here's an interesting data structure conundrum that perhaps you all can help me with, for context I am writing C#.
Context/Constraints:
I'm using a library (the new Unity ECS preview package, specifically) that allows me to store data in a very compact/efficient/native fashion for lightning-fast access and manipulation with no garbage collection. For some time it supported storing data in FixedArrays:
ComponentType.FixedArray<T>(int fixedCapacity) //pseudo-code
The API does not allow for any sort of managed data to be stored in these arrays, for performance and safety reasons, which means they must all be linear (no nested arrays or multiple dimensions) and the data elements themselves must be extremely simple (primitives or directly serializable structs, no fancy LinkedLists or references to other data structures). I cannot use a HashTable or Dictionary or any other similar data high-level data structure to solve this problem, I must use the provided data structure!
Problem:
I am trying to store a basic "Entity" object in the array which has an associated 3D point integer coordinate. I want to access the structure with this coordinate in-hand and retrieve/modify/delete my object from said coordinate. So it was a simple problem of accessing a linear indexed, fixed-width array using 3D coordinates, made possible using a hashing function.
//Pseudo-Code, this is not the actual code itself.
//In actuality the Fixed Arrays are associated with Entities in an ECS system.
var myStructure = new ComponentType.FixedArray<Entity>(512);//define array
struct DataPair {
Entity entity;//the element we're storing
Vector3 threeDIntegerCoordinate;//from 1x1x1 to 8x8x8
}
//...
int lookupFunction(Vector3 coordinate) {...} //converts 3D coord to 2D linear index
DataPair exampleDataPair = new DataPair(...);
//Data WAS stored like this:
myStructure[lookupFunction(exampleDataPair.threeDIntegerCoordinate)] = exampleDataPair.entity;
//Extremely fast access/lookup time due to using coordinate as index value.
Basically, I generated a variable amount (1 to 512, one 8x8x8 cube) Entities and store them by index into the FixedArray using a translation function that correlates a linear index value with every single 3D point coordinate. Lookup in a fixed array for a coordinate value was extremely fast, as simple as accessing an index.
However!
The package has been updated and they replaced the FixedArray with a new DynamicBuffer data structure, which is now variable-width. The same constraints apply on what data can be stored, but now, if the cube of Entities is sparse (not entirely full) then it does not need to reserve space for that non-existent Entity reference in the structure. This will drastically cut down on my memory usage considering most cubes of Entities are not entirely full, and I'm storing literally millions of these buffers in memory at a time. The buffer elements are indexed by integer. It is possible to use multiple DynamicBuffers at once (which means we could store the coordinates alongside the elements in two parallel buffers if necessary).
//New data structure provided. Variable-width! Also indexed linearly.
var myStructure = new ComponentType.DynamicBuffer<Entity>();
//Is very similar to C# List or ArrayList in Java, for example, contains functions:
myStructure.Add(T element);
myStructure.AddRange(...);
myStructure.Length();
myStructure.resizeUninitialized(int capacity);
myStructure.Clear();
In essence, what is the most efficient way to store these variable number of elements in a dynamic, dimensionless data structure (similar to a List) while maintaining 3D coordinate-based indexing, without using complex nested data structures to do so? My system is more performance-bound by the lookup/access time than it is by memory space.
Possible Solutions:
The naive solution is just to make all the DynamicBuffers length equal to the max number of elements I would ever want to store (512 Entity volume), simulating FixedArrays. This would require minimal changes to my codebase, and would allow me to access them by-coordinate using my current translation function, but would not take advantage of the space-saving features of the dynamic data structure. It would look like this:
//Naive Solution:
myStructure.resizeUninitialized(512); //resize buffer to 512 elements
//DynamicBuffer is now indexed identically to FixedArray
Entity elementToRetrieve = myStructure[exampleDataPair.threeDIntegerCoordinate];
My projected solution is to use two parallel DynamicBuffers: one with all the Entities, the other with all the 3D points. Then when I want to find an Entity by coordinate, I lookup the 3D point in the coordinate buffer, and use the index of that element to find the appropriate Entity in the primary buffer.
//Possible better solution:
myStructure1 = new ComponentType.DynamicBuffer<Entity>();
myStructure2 = new ComponentType.DynamicBuffer<Vector3>();
//to access an element:
Entity elementToRetrieve = myStructure1[myStructure2.Find(exampleDataPair.threeDIntegerCoordinate)];
//I would have to create this theoretical Find function.
Cons of this solution:
Requires searching which means it would probably also require sorting.
Sorting would need to be performed every time the structure is significantly modified, which is going to add a LARGE amount of computational overhead.
Would need to write my own search/sort algorithms on top of an already extremely complicated data structure, which is not designed to be searched/sorted (it is possibly not stored linearly in memory).
Locality of reference? It is very important that processor caching/speculative execution is preserved for high-performance.
How can I find a happy medium between the naive solution and a complex solution involving searching/sorting? Are there any theoretical data structures or algorithms to solve this problem that I'm just completely missing? Basically I need to efficiently use a List like a Map.
Sorry if this is a really long question, but I wanted to get this right, this is my first ever post here on StackExchange. Please be gentle! Thanks for all your help!
I have an abstractly large IEnumerable container of Wpf Geometry objects; the boundary of the Geometry objects are non-trivial; they don't represent simple geometric shapes like rectangles or circles, they are complex polygons. The List will never change once initially populated.
I then have a point, and I want to determine which Geometry contains that point.
List<Geometry> list = getList();
var point = new Point(x,y);
list.Any(y => y.Bounds.Contains(point) && y.FillContains(point));
This code works, but its generally slow. The initial Bounds check is a short circuit that ends up being about 50% faster than without it. I think the next layer of complexity is to setup some sort of pre-rendered hit-map Dictionary.
Is there anything better that already exists in WPF to accomplish this task in a more performance oriented fashion?
I ended up creating a custom class that used the bounding box of each Geometry to perform a tiered lookup. The first tier used the simple bounding box calculation to narrow the list of Geometry objects that were necessary to search.
Each "bucket" was calculated using the average size of all Geometries in the collection. This has problems for the general case, but given that most of my Geometries are roughly the same size, this was a decent solution.
MSDN is best
How to: Hit Test Geometry in a Visual
How to: Hit Test Using Geometry as a Parameter
I'm trying to construct a program in C# that generates a 3D model of a structure composed of beams, and then creates some views of the object (front, side, top and isometric).
As I don't need to draw surfaces (the edges are enough), I've been calculating each line to draw, and then do it with
GraphicObject.DrawLine(myPen, x1, y1, x2, y2)
This worked fine so far, but as I get adding parts to the structure, the refresh of GraphicObject takes too much time. So I'm getting into line visibility check to reduce the amount of lines to draw.
I've searched Wikipedia and some PDFs on the subject, but all I found is oriented by surfaces. So my question: Is there a simplified algorithm to check visibility of object edges, or should i go for a different approach, like considering surfaces?
Any suggestions would be appreciated, thanks for your help.
Additional notes/questions:
My current approach:
calculate every beam in a local axis (all vertices)
=> move them to their global position
=> create a list with pairs of points (projected and scaled to the view)
=> GraphicObject.DrawLine the list of point pairs)
would the whole thing be faster if I'd calculate the view by pixels rather than using the DrawLine method?
Screenshots follow with the type of structure it's going to do (not fully complete yet):
Structure view
Structure detail
There are 2 solutions to improve the performance.
a) switch the computation to the graphics card.
b) Use a kd-tree or some other similar data structure to quickly remove the non visible edges.
Here's more details:
For a), a lot of you computations are multiplying many vertices (vectors of length 3) by some matrix. The CPUs are slow because they only do a couple of these operations at a time. Switch to a GPU, for example using CUDA, which will allow you to do them more in parallel, with better memory access infrastructure. You can also use OpenGL/DirectX/Vulkan or whatever to render the lines themselves to skip having to get the results back from the graphics card and whatever other hiccups get introduced by windows code/libraries. This will help in almost all cases to improve performance.
For b), it only helps when you are not looking at the entire scene (in that case you really need to draw everything). In this cases you can store you scene in a kd-tree or some other data structure and use it to quickly remove things that are for sure outside of the view area. You usually need to intersect some cuboid with a pyramid/fustrum so there's more math involved.
As a compromise that should help in a large scenes where you want to see everything you can consider adjusting the level of detail. From your example, the read beans across are composed of 8 or so components. If you are far enough you are not going to be able to distinguish the 8, so just draw one. This will work great if you have a large number of rounded edges as you can simplify a lot of them.
I understand that some matrices have a lot of data, while others have mainly 0's or are empty. But what is the advantage of creating a SparseMatrix object to hold a sparsely populated matrix over creating a DenseMatrix object to hold a sparsely populated matrix? They both seem to offer more or less the same operations as far as methods go.
I'm also wondering when you would use a Matrix object to hold data -- as in are there any advantages or situations where this would be preferred over using the other two.
For small matrices (e.g. less than 1000x1000) dense matrices work well. But in practice there are a lot of problems where much larger matrices are needed, but where almost all values are zero (often with non-zero values close to the diagonal). With sparse matrices it is possible to handle very large matrices in cases where the dense structure is unfeasible (because it needs too much memory or is way to expensive to compute with CPU-time wise).
Note that as of today the Math.NET Numerics direct matrix decomposition methods are optimized for dense matrices only; use iterative solvers for sparse data instead.
Regarding types, in Math.NET Numerics v3 the hierarchy for double-valued matrices is as follows:
Matrix<double>
|- Double.Matrix
|- Double.DenseMatrix
|- Double.SparseMatrix
|- Double.DiagonalMatrix
With Matrix<T> I refer to the full type MathNet.Numerics.LinearAlgebra.Matrix<T>, with
Double.Matrix to MathNet.Numerics.LinearAlgebra.Double.Matrix, etc.
Matrix<double>: always declare all variables, properties and arguments using this generic type only. Indeed, in most cases this is the only type needed in user code.
Double.Matrix: do not use
Double.DenseMatrix: use for creating a dense matrix only - if you do not wish to use the builder (Matrix<double>.Build.Dense...)
Double.SparseMatrix: use for creating a sparse matrix only - if you do not wish to use the builder
Double.DiagonalMatrix: use for creating a diagonal matrix only - if you do not wish to use the builder
They each are optimized for that specific use. For example sparse matrix uses CSR format.
Compressed sparse row (CSR or CRS)
CSR is effectively identical to the Yale Sparse Matrix format, except
that the column array is normally stored ahead of the row index array.
I.e. CSR is (val, col_ind, row_ptr), where val is an array of the
(left-to-right, then top-to-bottom) non-zero values of the matrix;
col_ind is the column indices corresponding to the values; and,
row_ptr is the list of value indexes where each row starts. The name
is based on the fact that row index information is compressed relative
to the COO format. One typically uses another format (LIL, DOK, COO)
for construction. This format is efficient for arithmetic operations,
row slicing, and matrix-vector products. See scipy.sparse.csr_matrix.
See wiki for more info.
I am creating large scale worlds using 16*16*16 voxel chunks which are stacked up to 32*32*32 in dimensions and I have hit a bit of a Bump in the road so to speak.
I want to create large structures that span 20+*20+*20+ chunks in volume which are created from procedurally generated structures as well as using templates for some of the content. Now I have an issue. The visual render range is up to 32*32*32 chunks and while I have up to maybe 40*40*40 chunks held in memory at a time when possible.
The structures can be anything like towns, dungeons and roads. I was thinking something like perlin worms for roads and just lay them over the terrain in the x,z and then analyze the path for bridges etc..
The structures and collection of structures need to be pre-generated before the player is within visual range or work more like perlin noise does for heightmaps (best solution). (to avoid the players seeing the generator at work). They also need to be consistent with the world seed every time.
I have thought about this a bit and have 2 possible solutions.
1) Generate the structures based on a point of origin for the structure generator.
This causes several issues though as even if I generate from the center of the structure, the structures can easily cross into the potential visual range of the player.
2) Pre-Generate "unreachable" chunks and then page them in and out in order to generate the structures using the above method.
This also seems rather unnecessary.
Both methods need to analyze the terrain in large quantities for a valid location to spawn the structures.
I was hoping somebody might have a more organic solution or even just a simpler solution that doesn't require me to "Look" so far ahead.
Thank you in advance.
EDIT:
I had an idea for dungeon generation in which I generate point clouds/nodes for rooms.
Steps:
1) When the generator finds a "node" it creates an x, y and z size to create a box basing it from the originator point of the room** (centre or corner of the room) and the room type.
**x,y,z relative to 0,0,0 worldspace calculated like so new Vector3((chunkX*16)+voxelX,(chunkY*16)+voxelY,(chunkZ*16)+voxelZ)
2) Once a room size is calculated, check for overlaps and if one is found do one of several things.
If the room overlap is high up lower it down till either the roof or the floor are flush. If the roof is flush build a stairs up to the room and remove the walls that intersect.
3) Look Down, North and East for a room maybe with a small cone and attempt to create a hallway between them.
This would probably work somewhat, especially if the center of the dungeon is the main hall/boss room.
This would be different for towns, cities, and surface dungeons. Still seems a little choppy though. Any ideas?
I faced a similar problem for a Minecraft mod I am writing. I want to have a number of overlapping "empires" which each create structures. But I don't want the structures to step on each other.
So, for this, I broke the world into arbitrary sized tiles. (Compare to your 32x32x32 regions.) I also came up with a "radius of influence". This is how far from the center point that it could create structures. Each tile had an instance of a provider class assigned to it with a unique seed.
Two methods on this class were provided for structure generation.
First, was a function that would return where it wanted to create structures. But only to the resolution of chunks. (Compare to your 16x16x16 block sets.) Each provider class instance had a priority, so in the case of two providers trying to rezz a structure in the same chunks, the higher priority one would win.
The second function would be passed a world instance, and one of the data items returned by the first function and would be asked to actually create it.
Everything pieces together like this:
We get a request to resolve a certain chunk of the world. We work out the provider for the tile the chunk is in, and then all the providers for all the tiles that are within the maximum radius of that tile. We now have every provider that could influence this chunk. We call the first function on each of them, if they haven't been called already, and register what chunks each of them has claimed into a global map.
At this point, we've consulted everything that could have an influence on this chunk. We then ask that registry if someone has claimed this chunk. If so, we call back into that provider (method #2) with the chunk and the world instance and get it to draw the bits for this part of its structure.
Does that give you enough of an idea for a general approach to your problem?