Vectorize ND-Array to 1D-Array as fast as possible - c#

I'm trying to vectorize a n-dimensional array as 1-dimensional array in C# to later ease working using linear indexing (this whatever the type of the elements).
So far I was using Buffer.BlockCopy to do that (and even reshaping from n-dimensions to m-dimensions as long as the number of elements was not changing) but unfortunately I came across having to reshape arrays whose elements are not primitive types (double, single, int) and in this case Buffer.BlockCopy does not work (example array of string or whatever other non primitive type).
Currently the solution I have is to make special-case for non-primitive types:
/// <summary>Vectorize ND-array</summary>
/// <param name="arrayNd">ND-Array to vectorize.</param>
/// <returns>Surface copy as 1D array.</returns>
public static Array Vectorize(Array arrayNd)
{
// Check arguments
if (arrayNd == null) { return null; }
var elementCount = arrayNd.Length;
// Create 1D array
var tarray = arrayNd.GetType();
var telem = tarray.GetElementType();
var array1D = Array.CreateInstance(telem, elementCount);
// Surface copy
if (telem.IsPrimitive)
{
// Block copy only works for array whose elements are primitive types (double, single, ...)
var numberOfBytes = Buffer.ByteLength(arrayNd);
Buffer.BlockCopy(arrayNd, 0, array1D, 0, numberOfBytes);
}
else
{
// Slow version for other element types
// NB: arrayNd.GetValue(...) does not support linear indexing so need to compute indices for each dimension (very slow !!)
var indices = new int[arrayNd.Rank];
for (var i = 0; i < elementCount; i++)
{
var idx = i;
for (var d = arrayNd.Rank - 1; d >= 0; d--)
{
var l = arrayNd.GetLength(d);
indices[d] = idx % l;
idx /= l;
}
array1D.SetValue(arrayNd.GetValue(indices), i);
}
}
// Return as 1D
return array1D;
}
So this works now all types:
var double1D = Vectorize(new double[3, 2, 5]); // Fast BlockCopy
var string1D = Vectorize(new string[3, 2, 5]); // Slow solution
I already have an NEnumerator class of my own to speed up computing indices (instead of using modulo as above) but maybe there is really fast way for just making this sort of "surface memcpy" ?
NB1: I'd like to avoid unsafe code but if it's the only way ...
NB2: I really want to work with System.Array (eventually I'll later do a bunch of T[] Vectorize(T[,,,,] array) overloads but that's not the issue)

In my experience, Multidimensional arrays are kind of a pain to work with, in large part since it is so difficult to access the backing data. As far as I know there is no direct way to just copy all the elements for arbitrary types.
Because of this I tend to prefer a custom type for my 2D types that uses a linear array as backing storage, and index like myArray[y * width + x]. With this model the whole exercise becomes a no-op, and you can get a pointer to pass to native code, it works better with serialization etc.
For 3D/4D arrays you could use the same mode, but it seems like the best option for performance is allocate slices independently, i.e. myArray[z][y * width + x], at least for large arrays. I have not worked with 4D arrays, but in general, I would avoid multidimensional arrays if performance is a concern. There might also be libraries out there that might suit your needs, but I'm not aware of any specific one.
However, looking at your code I would expect there to be some possible improvements. You are currently doing N calls to GetLength, modulus & divisions for each element. So I would expect something like this to be a bit faster:
public static Array MultidimensionalToLinear(Array arr)
{
var rank = arr.Rank;
var lengths = new int[rank];
for (int i = 0; i < rank; i++)
{
lengths[i] = arr.GetLength(i);
}
var linearLength = arr.Length;
var result = Array.CreateInstance(arr.GetType().GetElementType(), linearLength);
var index = new int[rank];
var linearIndex = 0;
CopyRecursive(0, index, result, ref linearIndex);
void CopyRecursive(int rank, int[] index, Array result, ref int linearIndex)
{
var lastIndex = index.Length - 1;
if (rank == lastIndex)
{
for (int i = 0; i < lengths[lastIndex]; i++)
{
index[lastIndex] = i;
result.SetValue(arr.GetValue(index), linearIndex);
linearIndex++;
}
}
else
{
for (int i = 0; i < lengths[rank]; i++)
{
index[rank] = i;
CopyRecursive(rank +1, index, result, ref linearIndex);
}
}
}
return result;
}
However, when measuring it seem like the performance improvement is fairly small. Probably due the code in GetValue dominating the runtime.

Related

C# How to paste 2D array[,] onto second 2D array[,] on specified indexes

is there any better (less complex) way to paste the smaller array[,] to the bigger array[,] than looping through them? My code:
private void PasteRoomIntoTheMap(GameObject[,] room, (int, int) positionOnTheMap)
{
(int, int) roomSize = (room.GetLength(0), room.GetLength(1));
int roomsXAxesDimention = 0;
int roomsYAxesDimention = 0;
for (int i = positionOnTheMap.Item1; i <= positionOnTheMap.Item1 + roomSize.Item1; i++)
{
for (int j = positionOnTheMap.Item2; j <= positionOnTheMap.Item2 + roomSize.Item2; j++)
{
Map[i, j] = room[roomsXAxesDimention, roomsYAxesDimention];
roomsYAxesDimention++;
}
roomsXAxesDimention++;
}
}
(I didn't run it yet. There might be some errors but I hope that you will understand this method)
The code that I would love to have:
Map [5...15, 2...5] = room[0...room.GetLength(0), 0...room.GetLength(1)]
Short answer: No
Longer answer:
Multidimensional arrays are really a wrapper around a 1D array, with some special code to handle indices etc. It does however lack some useful features.
You could perhaps improve your example code a bit by copying entire columns or rows using Array.Copy or blockCopy instead of processing item by item. This might help performance if you have very many items that need copying, but is unlikely to make any difference if the sizes are small.
You could probably provide the syntax you desire by creating something like a ArraySlice2D class that reference the original 2D array as well as the region you want to copy. But you probably need to either create your own wrapper to provide index operators, or create extension methods to do it. Perhaps something like
public class My2DArray<T>{
private T[,] array;
public ArraySlice2D<T> this[Range x, Range y]{
get => new ArraySlice2D<T>(array, x, y);
set {
// Copy values from value.RangeX/Y to x/y ranges
}
}
You might also consider writing your own 2D array class that wraps a 1D array. I find that this often makes interoperability easier since most systems can handle 1D arrays, but few handle multidimensional arrays without conversion.
I am not aware of any shortcut for that and there might be to many slightly distinct usecase so that the most apropriate solution would be to implement your own functionality.
It is advisable however to add a size check to your method so you dont end up indexing nonexisting areas of the big array:
private bool ReplaceWithSubArray(int[,] array, int[,] subarray, (int x,int y) indices)
{
if (array.GetLength(0) < subarray.GetLength(0) + indices.x ||
array.GetLength(1) < subarray.GetLength(1) + indices.y)
{
// 'array' to small
return false;
}
for (int x = 0; x <= subarray.GetLength(0); x++)
{
for (int y = 0; y <= subarray.GetLength(1); y++)
{
array[x + indices.x, y + indices.y] = subarray[x, y];
}
}
return true;
}

Quickly apply a known sort order (old index -> new index mapping) to an array

I am trying to performance tune a routine that needs to sort 8 large arrays "in tandem", where one of the arrays is the array to sort by.
I've already taken care of sorting the first array using a method of my choosing (I'm using TimSort)
I've already taken care of making sure my array of sorted objects have a property denoting their original index. (e.g. sortedArray[0].OriginalIndex would return 2983 if previously unsortedArry[2983] turned out to be the first item)
This means if I were to loop over my now sorted array of objects, I think I can just get all other arrays sorted in the same order in the following naïve way:
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort)
for(int i = 0; i < sortedArray.length; i++) {
int originalIndex = sortedArray[i].OriginalIndex;
// Swap the corresponding index from all other arrays to their new position
foreach(object[] array in arraysToSort) {
object temp = array[i];
array[i] = array[originalIndex];
array[originalIndex] = temp;
}
}
return arraysToSort; // Returning original arrays sorted in-place
}
I believe the above algorithm to have the desired result, but it feels less efficient than it could be. (3 times as many assignments as needed?)
I also considered the following approach which minimizes assignments, but requires allocating new arrays to store sorted items, and garbage collecting the old arrays (unless I come up with a way to recycle the allocations between calls):
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort) =>
arraysToSort.Select(array =>
{
object[] tandemArray = new object[array.length];
for(int i = 0; i < sortedArray.length; i++)
tandemArray[i] = array[sortedArray[i].OriginalIndex];
}); // Returning newly-allocated arrays
This sort of thing is done continuously in a performance-critical area of code, so I'm looking for thoughts on how I might get the best of both worlds.
Thinking more about the second solution above (allocating new arrays) - it occurred to me that the list of arrays passed in can also be "repurposed" once their sorted variant has been produced, so I actually only need to allocate one new array and then I can reuse the ones passed in to prepare additional results:
// Note the allocated arraysToSort passed in will be repurposed to produced a new set of sorted
// arrays, so the caller must be sure to discard their references and only use what is returned.
private List<object[]> SortInTandem(IndexedObj[] sortedArray, List<object[]> arraysToSort)
{
List<object[]> sortedArrays = new List<object[]>(arraysToSort.Count);
object[] tandemArray = new object[array.length];
for(int i = 0; i < arraysToSort.Count; i++)
{
for(int j = 0; j < sortedArray.length; j++)
tandemArray[j] = array[sortedArray[j].OriginalIndex];
sortedArrays.Add(tandemArray);
tandemArray = arraysToSort[i];
}
return sortedArrays; // Returning one newly-allocated + all but one original arrays repurposed
}

A workaround for a big multidimensional array (Jagged Array) C#?

I'm trying to initialize an array in three dimension to load a voxel world.
The total size of the map should be (2048/1024/2048). I tried to initialize an jagged array of "int" but I throw a memory exception. What is the size limit?
Size of my table: 2048 * 1024 * 2048 = 4'191'893'824
Anyone know there a way around this problem?
// System.OutOfMemoryException here !
int[][][] matrice = CreateJaggedArray<int[][][]>(2048,1024,2048);
// if i try normal Initialization I also throws the exception
int[, ,] matrice = new int[2048,1024,2048];
static T CreateJaggedArray<T>(params int[] lengths)
{
return (T)InitializeJaggedArray(typeof(T).GetElementType(), 0, lengths);
}
static object InitializeJaggedArray(Type type, int index, int[] lengths)
{
Array array = Array.CreateInstance(type, lengths[index]);
Type elementType = type.GetElementType();
if (elementType != null)
{
for (int i = 0; i < lengths[index]; i++)
{
array.SetValue(
InitializeJaggedArray(elementType, index + 1, lengths), i);
}
}
return array;
}
The maximum size of a single object in C# is 2GB. Since you are creating a multi-dimensional array rather than a jagged array (despite the name of your method) it is a single object that needs to contain all of those items, not several. If you actually used a jagged array then you wouldn't have a single item with all of that data (even though the total memory footprint would be a tad larger, not smaller, it's just spread out more).
Thank you so much to all the staff who tried to help me in understanding and solving my problem.
I tried several solution to be able to load a lot of data and stored in a table.
After two days, here are my tests and finally the solution which can store 4'191'893'824 entry into one array
I add my final solution, hoping someone could help
the goal
I recall the goal: Initialize an integer array [2048/1024/2048] for storing 4'191'893'824 data
Test 1: with JaggedArray method (failure)
system out of memory exception thrown
/* ******************** */
/* Jagged Array method */
/* ******************** */
// allocate the first dimension;
bigData = new int[2048][][];
for (int x = 0; x < 2048; x++)
{
// allocate the second dimension;
bigData[x] = new int[1024][];
for (int y = 0; y < 1024; y++)
{
// the last dimension allocation
bigData[x][y] = new int[2048];
}
}
Test 2: with List method (failure)
system out of memory exception thrown (divide the big array into several small array .. Does not work because "List <>" allows a maximum of "2GB" Ram allocution like a simple array unfortunately.)
/* ******************** */
/* List method */
/* ******************** */
List<int[,,]> bigData = new List<int[,,]>(512);
for (int a = 0; a < 512; a++)
{
bigData.Add(new int[256, 128, 256]);
}
Test 3: with MemoryMappedFile (Solution)
I finally finally found the solution!
Use the class "Memory Mapped File" contains the contents of a file in virtual memory.
MemoryMappedFile MSDN
Use with custom class that I found on codeproject here. The initialization is long but it works well!
/* ************************ */
/* MemoryMappedFile method */
/* ************************ */
string path = AppDomain.CurrentDomain.BaseDirectory;
var myList = new GenericMemoryMappedArray<int>(2048L*1024L*2048L, path);
using (myList)
{
myList.AutoGrow = false;
/*
for (int a = 0; a < (2048L * 1024L * 2048L); a++)
{
myList[a] = a;
}
*/
myList[12456] = 8;
myList[1939848234] = 1;
// etc...
}
From the MSDN documentation on Arrays (emphasis added)
By default, the maximum size of an Array is 2 gigabytes (GB). In a
64-bit environment, you can avoid the size restriction by setting the
enabled attribute of the gcAllowVeryLargeObjects configuration element
to true in the run-time environment. However, the array will still be
limited to a total of 4 billion elements, and to a maximum index of
0X7FEFFFFF in any given dimension (0X7FFFFFC7 for byte arrays and
arrays of single-byte structures).
So despite the above answers, even if you set the flag to allow a larger object size, the array is still limited to the 32bit limit of the number of elements.
EDIT: You'll likely have to redesign to eliminate the need for a multidimensional array as you're currently using it (as others have suggested, there are a few ways to do this between using actual jagged arrays, or some other collection of dimensions). Given the scale of the number of elements, it may be best to use a design that dynamically allocates objects/memory as used instead of arrays that have to pre-allocate it. (unless you don't mind using many gigabytes of memory) EDITx2: That is, perhaps you can define data structures that define filled content rather than defining every possible voxel in the world, even the "empty" ones. (I'm assuming the vast majority of voxels are "empty" rather than "filled")
EDIT: Although not trivial, especially if most of the space is considered "empty", then your best bet would be to introduce some sort of spatial tree that will let you efficiently query your world to see what objects are in a particular area. For example: Octrees (as Eric suggested) or RTrees
Creating this object as described, either as a standard array or as a jagged array, is going to destroy the locality of reference that allows your CPU to be performant. I recommend you use a structure like this instead:
class BigArray
{
ArrayCell[,,] arrayCell = new ArrayCell[32,16,32];
public int this[int i, int j, int k]
{
get { return (arrayCell[i/64, j/64, k/64])[i%64, j%64, k%16]; }
}
}
class ArrayCell
{
int[,,] cell = new int[64,64,64];
public int this[int i, int j, int k]
{
get { return cell[i,j,k]; }
}
}

DataContract + Multi-dimensional Arrays -- Any solution for this?

From MSDN:
Combining collection types (having collections of collections) is
allowed. Jagged arrays are treated as collections of collections.
Multidimensional arrays are not supported.
So, if you can't normally serialize a multidimensional array, how does one get around this efficiently? My thought is to have a property that flattens the array and serialize that collection and unflatten it during deserialization, but I'm not sure if that's efficient?
Anyone find a solution this before?
I should note that the reason I think flattening might work is because my dimensions are a fixed value (they are hard-coded).
Yes, you could flatten it or Jagged-ize it, whatever is more convenient.
Any time you rip through a 2-D array you're expending O(n) effort (or one "processing" unit per item in the input). It's not much trouble to convert between 2-D and 1-D and back as you said. Unless you're dealing in really high volumes (array size of call frequency of the web service), or on a highly constrained system (.Net Compact or .Net Micro Frameworks), I doubt this would really be a big issue. Things like sorting are what get expensive.
string[,] input = new string[5, 3];
string[] output = new string[15];
for (int i = 0; i < input.GetUpperBound(0); i++)
{
for (int j = 0; j < input.GetUpperBound(1); j++)
{
output[j * input.GetUpperBound(j) + i] = input[i, j];
}
}
Maybe with an extension method:
public static string[][] Jaggedize(this string[,] input) {
string[][] output = new string[input.GetLength(0)][];
for (int i = 0; i < input.GetLength(0); i++) {
output[i] = new string[input.GetLength(1)];
for (int j = 0; j < input.GetLength(1); j++) {
output[i][j] = input[i, j];
}
}
return output;
}

What's the best way to implement an unfixed multi-deminsional array in C#.NET?

For example: an array of varying-length arrays of integers.
In C++, we are used to doing things like:
int * * TwoDimAry = new int * [n] ;
for ( int i ( 0 ) ; i < n ; i ++ )
{
TwoDimAry[i] = new int [i + n] ;
}
In this case, if n == 3 then the result would be an array of three pointers to arrays of integers, and would appear like this:
http://img263.imageshack.us/img263/4149/multidimarray.png
Of course, .NET arrays are managed collections, so you don't have to deal with the manual allocation/deletion.
But declaring:
int[][] TwoDimAry ;
... in C# does not appear to have the same effect - namely, you have to innitialize ALL of the sub-arrays at the same time, and they have to be the same length.
I need my sub-arrays to be independent of each-other, as they are in native C++.
What's the best way to implement this using managed collections? Are there any drawbacks I should be aware of?
Like C++, you need to initialize every subarray in an int[][].
However, they don't need to have the same length. (That's why it's called a jagged array)
For example:
int[][] jagged1 = new int[][] { new int[1], new int[2], new int[3] };
Your C++ code can be translated directly to C#:
int[][] TwoDimAry = new int[n][];
for(int i = 0; i < n; i++) {
TwoDimAry[i] = new int[i + n];
}
Here is an example with a jagged array initialized with 1, 2, 3, .. elements for each row
int N = 20;
int[][] array = new int[N][]; // First index is rows, second is columns
for(int i=0; i < N; i++)
{
array[i] = new int[i+1]; // Initialize i-th row with 'i' columns
for( int j = 0; j <= i; j++)
{
array[i][j] = N*j+i; // Set a value for each column in the row
}
}
I have use this enough to know that there aren't many drawbacks overall. Hybrid appraches with List<int[]> or List<int>[] also work.
In .Net, most of the time you don't want to use arrays this way at all. This is because in .Net, arrays are thought of as a different animal from a collection. Managed, yes. Collection? Well, maybe, but it confuses terms because that means something special. If you want a collection (hint: most of the time you do), look in the Systems.Collections namespace, particularly Systems.Collections.Generic. It sounds like you really want either a List<List<int>> or a List<int[]>.

Categories

Resources