A workaround for a big multidimensional array (Jagged Array) C#? - c#

I'm trying to initialize an array in three dimension to load a voxel world.
The total size of the map should be (2048/1024/2048). I tried to initialize an jagged array of "int" but I throw a memory exception. What is the size limit?
Size of my table: 2048 * 1024 * 2048 = 4'191'893'824
Anyone know there a way around this problem?
// System.OutOfMemoryException here !
int[][][] matrice = CreateJaggedArray<int[][][]>(2048,1024,2048);
// if i try normal Initialization I also throws the exception
int[, ,] matrice = new int[2048,1024,2048];
static T CreateJaggedArray<T>(params int[] lengths)
{
return (T)InitializeJaggedArray(typeof(T).GetElementType(), 0, lengths);
}
static object InitializeJaggedArray(Type type, int index, int[] lengths)
{
Array array = Array.CreateInstance(type, lengths[index]);
Type elementType = type.GetElementType();
if (elementType != null)
{
for (int i = 0; i < lengths[index]; i++)
{
array.SetValue(
InitializeJaggedArray(elementType, index + 1, lengths), i);
}
}
return array;
}

The maximum size of a single object in C# is 2GB. Since you are creating a multi-dimensional array rather than a jagged array (despite the name of your method) it is a single object that needs to contain all of those items, not several. If you actually used a jagged array then you wouldn't have a single item with all of that data (even though the total memory footprint would be a tad larger, not smaller, it's just spread out more).

Thank you so much to all the staff who tried to help me in understanding and solving my problem.
I tried several solution to be able to load a lot of data and stored in a table.
After two days, here are my tests and finally the solution which can store 4'191'893'824 entry into one array
I add my final solution, hoping someone could help
the goal
I recall the goal: Initialize an integer array [2048/1024/2048] for storing 4'191'893'824 data
Test 1: with JaggedArray method (failure)
system out of memory exception thrown
/* ******************** */
/* Jagged Array method */
/* ******************** */
// allocate the first dimension;
bigData = new int[2048][][];
for (int x = 0; x < 2048; x++)
{
// allocate the second dimension;
bigData[x] = new int[1024][];
for (int y = 0; y < 1024; y++)
{
// the last dimension allocation
bigData[x][y] = new int[2048];
}
}
Test 2: with List method (failure)
system out of memory exception thrown (divide the big array into several small array .. Does not work because "List <>" allows a maximum of "2GB" Ram allocution like a simple array unfortunately.)
/* ******************** */
/* List method */
/* ******************** */
List<int[,,]> bigData = new List<int[,,]>(512);
for (int a = 0; a < 512; a++)
{
bigData.Add(new int[256, 128, 256]);
}
Test 3: with MemoryMappedFile (Solution)
I finally finally found the solution!
Use the class "Memory Mapped File" contains the contents of a file in virtual memory.
MemoryMappedFile MSDN
Use with custom class that I found on codeproject here. The initialization is long but it works well!
/* ************************ */
/* MemoryMappedFile method */
/* ************************ */
string path = AppDomain.CurrentDomain.BaseDirectory;
var myList = new GenericMemoryMappedArray<int>(2048L*1024L*2048L, path);
using (myList)
{
myList.AutoGrow = false;
/*
for (int a = 0; a < (2048L * 1024L * 2048L); a++)
{
myList[a] = a;
}
*/
myList[12456] = 8;
myList[1939848234] = 1;
// etc...
}

From the MSDN documentation on Arrays (emphasis added)
By default, the maximum size of an Array is 2 gigabytes (GB). In a
64-bit environment, you can avoid the size restriction by setting the
enabled attribute of the gcAllowVeryLargeObjects configuration element
to true in the run-time environment. However, the array will still be
limited to a total of 4 billion elements, and to a maximum index of
0X7FEFFFFF in any given dimension (0X7FFFFFC7 for byte arrays and
arrays of single-byte structures).
So despite the above answers, even if you set the flag to allow a larger object size, the array is still limited to the 32bit limit of the number of elements.
EDIT: You'll likely have to redesign to eliminate the need for a multidimensional array as you're currently using it (as others have suggested, there are a few ways to do this between using actual jagged arrays, or some other collection of dimensions). Given the scale of the number of elements, it may be best to use a design that dynamically allocates objects/memory as used instead of arrays that have to pre-allocate it. (unless you don't mind using many gigabytes of memory) EDITx2: That is, perhaps you can define data structures that define filled content rather than defining every possible voxel in the world, even the "empty" ones. (I'm assuming the vast majority of voxels are "empty" rather than "filled")
EDIT: Although not trivial, especially if most of the space is considered "empty", then your best bet would be to introduce some sort of spatial tree that will let you efficiently query your world to see what objects are in a particular area. For example: Octrees (as Eric suggested) or RTrees

Creating this object as described, either as a standard array or as a jagged array, is going to destroy the locality of reference that allows your CPU to be performant. I recommend you use a structure like this instead:
class BigArray
{
ArrayCell[,,] arrayCell = new ArrayCell[32,16,32];
public int this[int i, int j, int k]
{
get { return (arrayCell[i/64, j/64, k/64])[i%64, j%64, k%16]; }
}
}
class ArrayCell
{
int[,,] cell = new int[64,64,64];
public int this[int i, int j, int k]
{
get { return cell[i,j,k]; }
}
}

Related

how to create multidimensional arrays in C# without knowing the size

I need to understand on a practical level how to create a matrix[][] in C# without knowing the size.
And consequently also how to modify it (delete elements depending on a search key).
I have an example loop. Two random string variables. Then I am no longer able to continue....
private static Random random = new Random();
for (int i=0; i<unKnown; i++){
var firstVar = RandomString(5);
var secondVar = RandomString(20);
//Matrix[][]
}
public static string RandomString(int length){
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
Thank you
Arrays are fixed size. They do not adjust their size automatically. E.g. the size is defined when creating the array with
string[] array = new string[10];
If your array is 2 dimensional (10x10) and you delete the value at (1:1) the Array still remains 10x10 but the field at 1:1 is null now.
If you need a solution that adjusts its size you might want to look into Lists.
Otherwise, I advise you to read the documentation.
It really depend on what you want to do.
If you want a 2D array of values you can use multidimensional arrays. This supports arbitrary dimensions, but for more dimensions data sizes tend to go up and other solutions might be preferable:
var matrix = new double[4, 2];
If you want to do math you might want to use a library like Math.Net with specialized matrix types:
var matrix = Matrix<double>.Build.Dense(4, 2);
If you want to do computer graphics you likely want to use a specialized library, like system.Numerics.Matrix4x4
var matrix = new Matrix4x4();
It is also not particularly difficult to create your own matrix class that wraps a regular array. This has the benefit that interoperability is often easier, since most framework and tools accept accept pointers or 1D arrays, while few can handle a multidimensional array. Indexing can be done like:
public class MyMatrix<T>
{
public int Width { get; }
public T[] Data { get; }
public T this[int x, int y]
{
get => Data[y * Width + x];
set => Data[y * Width + x] = value;
}
}
There is also jagged arrays, but there is no guarantee that these will be "square", so they are probably not appropriate if you want a "matrix".
In all cases you will need to loop over the matrix and check each element if you want to do any kind of replacement. Some alternatives require separate loops for width/height, while some allow for a single loop.
I'm not sure if you want a matrix or an array.
Matrix would be like
string[,] matrix = new string[10, 10];
and array would be like
string[] array = new string[10];
You can access the array with array[i] and matrix with matrix[i, j]
You could also use
List<List<string>> matrix = new List<List<string>>(); which may be more convenient to work with and can also be access with indexers. For example
matrix[i][j] = "bob";
matrix[i].RemoveAt(j);
Given the problem you have submitted maybe just a List<string> would work for you.

How can you fill an entire array pointer with a single value with a single write operation?

I have a pointer to a byte array, and I need to set the values of a certain region of this array to 0. I'm quite familiar with the methods available through the Marshal/Buffer/Array classes, and this problem is not at all hard.
The problem, however, is that I do not want to create excessive arrays, or write every byte one-by-one. All the methods I'm familiar with require full arrays, though, and they obviously don't work with single values.
I've seen several C methods that would achieve the result I'm looking for, but I don't have believe I have access to these methods without including the whole C library, or without writing platform-specific code.
My current solution is shown below, but I'd like to achieve this without allocating a new byte array.
Marshal.Copy(new byte[Length], 0, ptr + offset, length);
So is there a method in C#, or in an unmanaged language/library that I can use to fill an array (via a pointer) at a certain offset and for a certain length, with one single value (0)?
Miraculously, ChatGPT came rather close when I asked what would be a good solution to this problem. It didn't figure it out, but it suggested that I use spans.
As such, this is the solution I've come up with:
Span<byte> span = new Span<byte>(ptr + offset, Length);
span.Fill(0);
This solution is about 25 times faster than having to allocate a byte array for very large arrays.
Example benchmarks:
int size = 100_000;
nint ArrayPointer = Marshal.AllocHGlobal(size);
int trials = 1_000_000;
// Runtime was 1582ms
Benchmark("Fill with span", () =>
{
Span<byte> span = new Span<byte>((void*) ArrayPointer, size);
span.Fill(0);
}, trials);
// Runtime was 40681ms
Benchmark("Fill with allocation", () =>
{
Marshal.Copy(new byte[size], 0, ArrayPointer, size);
}, trials);
// Far too slow to get a result with these settings
Benchmark("Fill individually", () =>
{
for (int i = 0; i < size; i++)
{
Marshal.WriteByte(ArrayPointer + i, 0);
}
}, trials);
// Results with size = 100_000 and trials = 100_000
// Fill with span: 176ms
// Fill with allocation: 4382ms
// Fill individually: 24672ms
You can use Fill for this
arrayName.Fill('X',4,10) // fill character array at index 4 for 10 elements with character X
https://learn.microsoft.com/en-us/dotnet/api/system.array.fill?view=net-7.0
Note: The documentation for C# is quite good. You can go to the website and see all the Methods for array. If you really care how this is implemented you could even go to github and read the source code.

Quickly apply a known sort order (old index -> new index mapping) to an array

I am trying to performance tune a routine that needs to sort 8 large arrays "in tandem", where one of the arrays is the array to sort by.
I've already taken care of sorting the first array using a method of my choosing (I'm using TimSort)
I've already taken care of making sure my array of sorted objects have a property denoting their original index. (e.g. sortedArray[0].OriginalIndex would return 2983 if previously unsortedArry[2983] turned out to be the first item)
This means if I were to loop over my now sorted array of objects, I think I can just get all other arrays sorted in the same order in the following naïve way:
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort)
for(int i = 0; i < sortedArray.length; i++) {
int originalIndex = sortedArray[i].OriginalIndex;
// Swap the corresponding index from all other arrays to their new position
foreach(object[] array in arraysToSort) {
object temp = array[i];
array[i] = array[originalIndex];
array[originalIndex] = temp;
}
}
return arraysToSort; // Returning original arrays sorted in-place
}
I believe the above algorithm to have the desired result, but it feels less efficient than it could be. (3 times as many assignments as needed?)
I also considered the following approach which minimizes assignments, but requires allocating new arrays to store sorted items, and garbage collecting the old arrays (unless I come up with a way to recycle the allocations between calls):
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort) =>
arraysToSort.Select(array =>
{
object[] tandemArray = new object[array.length];
for(int i = 0; i < sortedArray.length; i++)
tandemArray[i] = array[sortedArray[i].OriginalIndex];
}); // Returning newly-allocated arrays
This sort of thing is done continuously in a performance-critical area of code, so I'm looking for thoughts on how I might get the best of both worlds.
Thinking more about the second solution above (allocating new arrays) - it occurred to me that the list of arrays passed in can also be "repurposed" once their sorted variant has been produced, so I actually only need to allocate one new array and then I can reuse the ones passed in to prepare additional results:
// Note the allocated arraysToSort passed in will be repurposed to produced a new set of sorted
// arrays, so the caller must be sure to discard their references and only use what is returned.
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort)
{
List<object[]> sortedArrays = new List<object[]>(arraysToSort.Count);
object[] tandemArray = new object[array.length];
for(int i = 0; i < arraysToSort.Count; i++)
{
for(int j = 0; j < sortedArray.length; j++)
tandemArray[j] = array[sortedArray[j].OriginalIndex];
sortedArrays.Add(tandemArray);
tandemArray = arraysToSort[i];
}
return sortedArrays; // Returning one newly-allocated + all but one original arrays repurposed
}

Vectorize ND-Array to 1D-Array as fast as possible

I'm trying to vectorize a n-dimensional array as 1-dimensional array in C# to later ease working using linear indexing (this whatever the type of the elements).
So far I was using Buffer.BlockCopy to do that (and even reshaping from n-dimensions to m-dimensions as long as the number of elements was not changing) but unfortunately I came across having to reshape arrays whose elements are not primitive types (double, single, int) and in this case Buffer.BlockCopy does not work (example array of string or whatever other non primitive type).
Currently the solution I have is to make special-case for non-primitive types:
/// <summary>Vectorize ND-array</summary>
/// <param name="arrayNd">ND-Array to vectorize.</param>
/// <returns>Surface copy as 1D array.</returns>
public static Array Vectorize(Array arrayNd)
{
// Check arguments
if (arrayNd == null) { return null; }
var elementCount = arrayNd.Length;
// Create 1D array
var tarray = arrayNd.GetType();
var telem = tarray.GetElementType();
var array1D = Array.CreateInstance(telem, elementCount);
// Surface copy
if (telem.IsPrimitive)
{
// Block copy only works for array whose elements are primitive types (double, single, ...)
var numberOfBytes = Buffer.ByteLength(arrayNd);
Buffer.BlockCopy(arrayNd, 0, array1D, 0, numberOfBytes);
}
else
{
// Slow version for other element types
// NB: arrayNd.GetValue(...) does not support linear indexing so need to compute indices for each dimension (very slow !!)
var indices = new int[arrayNd.Rank];
for (var i = 0; i < elementCount; i++)
{
var idx = i;
for (var d = arrayNd.Rank - 1; d >= 0; d--)
{
var l = arrayNd.GetLength(d);
indices[d] = idx % l;
idx /= l;
}
array1D.SetValue(arrayNd.GetValue(indices), i);
}
}
// Return as 1D
return array1D;
}
So this works now all types:
var double1D = Vectorize(new double[3, 2, 5]); // Fast BlockCopy
var string1D = Vectorize(new string[3, 2, 5]); // Slow solution
I already have an NEnumerator class of my own to speed up computing indices (instead of using modulo as above) but maybe there is really fast way for just making this sort of "surface memcpy" ?
NB1: I'd like to avoid unsafe code but if it's the only way ...
NB2: I really want to work with System.Array (eventually I'll later do a bunch of T[] Vectorize(T[,,,,] array) overloads but that's not the issue)
In my experience, Multidimensional arrays are kind of a pain to work with, in large part since it is so difficult to access the backing data. As far as I know there is no direct way to just copy all the elements for arbitrary types.
Because of this I tend to prefer a custom type for my 2D types that uses a linear array as backing storage, and index like myArray[y * width + x]. With this model the whole exercise becomes a no-op, and you can get a pointer to pass to native code, it works better with serialization etc.
For 3D/4D arrays you could use the same mode, but it seems like the best option for performance is allocate slices independently, i.e. myArray[z][y * width + x], at least for large arrays. I have not worked with 4D arrays, but in general, I would avoid multidimensional arrays if performance is a concern. There might also be libraries out there that might suit your needs, but I'm not aware of any specific one.
However, looking at your code I would expect there to be some possible improvements. You are currently doing N calls to GetLength, modulus & divisions for each element. So I would expect something like this to be a bit faster:
public static Array MultidimensionalToLinear(Array arr)
{
var rank = arr.Rank;
var lengths = new int[rank];
for (int i = 0; i < rank; i++)
{
lengths[i] = arr.GetLength(i);
}
var linearLength = arr.Length;
var result = Array.CreateInstance(arr.GetType().GetElementType(), linearLength);
var index = new int[rank];
var linearIndex = 0;
CopyRecursive(0, index, result, ref linearIndex);
void CopyRecursive(int rank, int[] index, Array result, ref int linearIndex)
{
var lastIndex = index.Length - 1;
if (rank == lastIndex)
{
for (int i = 0; i < lengths[lastIndex]; i++)
{
index[lastIndex] = i;
result.SetValue(arr.GetValue(index), linearIndex);
linearIndex++;
}
}
else
{
for (int i = 0; i < lengths[rank]; i++)
{
index[rank] = i;
CopyRecursive(rank +1, index, result, ref linearIndex);
}
}
}
return result;
}
However, when measuring it seem like the performance improvement is fairly small. Probably due the code in GetValue dominating the runtime.

How do you initialize a 2 dimensional array when you do not know the size

I have a two dimensional array that I need to load data into. I know the width of the data (22 values) but I do not know the height (estimated around 4000 records, but variable).
I have it declared as follows:
float[,] _calibrationSet;
....
int calibrationRow = 0;
While (recordsToRead)
{
for (int i = 0; i < SensorCount; i++)
{
_calibrationSet[calibrationRow, i] = calibrationArrayView.ReadFloat();
}
calibrationRow++;
}
This causes a NullReferenceException, so when I try to initialize it like this:
_calibrationSet = new float[,];
I get an "Array creation must have array size or array initializer."
Thank you,
Keith
You can't use an array.
Or rather, you would need to pick a size, and if you ended up needing more then you would have to allocate a new, larger, array, copy the data from the old one into the new one, and continue on as before (until you exceed the size of the new one...)
Generally, you would go with one of the collection classes - ArrayList, List<>, LinkedList<>, etc. - which one depends a lot on what you're looking for; List will give you the closest thing to what i described initially, while LinkedList<> will avoid the problem of frequent re-allocations (at the cost of slower access and greater memory usage).
Example:
List<float[]> _calibrationSet = new List<float[]>();
// ...
while (recordsToRead)
{
float[] record = new float[SensorCount];
for (int i = 0; i < SensorCount; i++)
{
record[i] = calibrationArrayView.ReadFloat();
}
_calibrationSet.Add(record);
}
// access later: _calibrationSet[record][sensor]
Oh, and it's worth noting (as Grauenwolf did), that what i'm doing here doesn't give you the same memory structure as a single, multi-dimensional array would - under the hood, it's an array of references to other arrays that actually hold the data. This speeds up building the array a good deal by making reallocation cheaper, but can have an impact on access speed (and, of course, memory usage). Whether this is an issue for you depends a lot on what you'll be doing with the data after it's loaded... and whether there are two hundred records or two million records.
You can't create an array in .NET (as opposed to declaring a reference to it, which is what you did in your example) without specifying its dimensions, either explicitly, or implicitly by specifying a set of literal values when you initialize it. (e.g. int[,] array4 = { { 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } };)
You need to use a variable-size data structure first (a generic list of 22-element 1-d arrays would be the simplest) and then allocate your array and copy your data into it after your read is finished and you know how many rows you need.
I would just use a list, then convert that list into an array.
You will notice here that I used a jagged array (float[][]) instead of a square array (float [,]). Besides being the "standard" way of doing things, it should be much faster. When converting the data from a list to an array you only have to copy [calibrationRow] pointers. Using a square array, you would have to copy [calibrationRow] x [SensorCount] floats.
var tempCalibrationSet = new List<float[]>();
const int SensorCount = 22;
int calibrationRow = 0;
while (recordsToRead())
{
tempCalibrationSet[calibrationRow] = new float[SensorCount];
for (int i = 0; i < SensorCount; i++)
{
tempCalibrationSet[calibrationRow][i] = calibrationArrayView.ReadFloat();
} calibrationRow++;
}
float[][] _calibrationSet = tempCalibrationSet.ToArray();
I generally use the nicer collections for this sort of work (List, ArrayList etc.) and then (if really necessary) cast to T[,] when I'm done.
you would either need to preallocate the array to a Maximum size (float[999,22] ) , or use a different data structure.
i guess you could copy/resize on the fly.. (but i don't think you'd want to)
i think the List sounds reasonable.
You could also use a two-dimensional ArrayList (from System.Collections) -- you create an ArrayList, then put another ArrayList inside it. This will give you the dynamic resizing you need, but at the expense of a bit of overhead.

Categories

Resources