Differences between a multidimensional array "[,]" and an array of arrays "[][]" in C#? - c#

What are the differences between multidimensional arrays double[,] and array of arrays double[][] in C#?
If there is a difference?
What is the best use for each one?

Array of arrays (jagged arrays) are faster than multi-dimensional arrays and can be used more effectively. Multidimensional arrays have nicer syntax.
If you write some simple code using jagged and multidimensional arrays and then inspect the compiled assembly with an IL disassembler you will see that the storage and retrieval from jagged (or single dimensional) arrays are simple IL instructions while the same operations for multidimensional arrays are method invocations which are always slower.
Consider the following methods:
static void SetElementAt(int[][] array, int i, int j, int value)
{
array[i][j] = value;
}
static void SetElementAt(int[,] array, int i, int j, int value)
{
array[i, j] = value;
}
Their IL will be the following:
.method private hidebysig static void SetElementAt(int32[][] 'array',
int32 i,
int32 j,
int32 'value') cil managed
{
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: ldelem.ref
IL_0003: ldarg.2
IL_0004: ldarg.3
IL_0005: stelem.i4
IL_0006: ret
} // end of method Program::SetElementAt
.method private hidebysig static void SetElementAt(int32[0...,0...] 'array',
int32 i,
int32 j,
int32 'value') cil managed
{
// Code size 10 (0xa)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: ldarg.3
IL_0004: call instance void int32[0...,0...]::Set(int32,
int32,
int32)
IL_0009: ret
} // end of method Program::SetElementAt
When using jagged arrays you can easily perform such operations as row swap and row resize. Maybe in some cases usage of multidimensional arrays will be more safe, but even Microsoft FxCop tells that jagged arrays should be used instead of multidimensional when you use it to analyse your projects.

A multidimensional array creates a nice linear memory layout while a jagged array implies several extra levels of indirection.
Looking up the value jagged[3][6] in a jagged array var jagged = new int[10][5] works like this:
Look up the element at index 3 (which is an array).
Look up the element at index 6 in that array (which is a value).
For each dimension in this case, there's an additional look up (this is an expensive memory access pattern).
A multidimensional array is laid out linearly in memory, the actual value is found by multiplying together the indexes. However, given the array var mult = new int[10,30], the Length property of that multidimensional array returns the total number of elements i.e. 10 * 30 = 300.
The Rank property of a jagged array is always 1, but a multidimensional array can have any rank. The GetLength method of any array can be used to get the length of each dimension. For the multidimensional array in this example mult.GetLength(1) returns 30.
Indexing the multidimensional array is faster. e.g. given the multidimensional array in this example mult[1,7] = 30 * 1 + 7 = 37, get the element at that index 37. This is a better memory access pattern because only one memory location is involved, which is the base address of the array.
A multidimensional array therefore allocates a continuous memory block, while a jagged array does not have to be square, e.g. jagged[1].Length does not have to equal jagged[2].Length, which would be true for any multidimensional array.
Performance
Performance wise, multidimensional arrays should be faster. A lot faster, but due to a really bad CLR implementation they are not.
23.084 16.634 15.215 15.489 14.407 13.691 14.695 14.398 14.551 14.252
25.782 27.484 25.711 20.844 19.607 20.349 25.861 26.214 19.677 20.171
5.050 5.085 6.412 5.225 5.100 5.751 6.650 5.222 6.770 5.305
The first row are timings of jagged arrays, the second shows multidimensional arrays and the third, well that's how it should be. The program is shown below, FYI this was tested running Mono. (The Windows timings are vastly different, mostly due to the CLR implementation variations).
On Windows, the timings of the jagged arrays are greatly superior, about the same as my own interpretation of what multidimensional array look up should be like, see 'Single()'. Sadly the Windows JIT-compiler is really stupid, and this unfortunately makes these performance discussions difficult, there are too many inconsistencies.
These are the timings I got on Windows, same deal here, the first row are jagged arrays, second multidimensional and third my own implementation of multidimensional, note how much slower this is on Windows compared to Mono.
8.438 2.004 8.439 4.362 4.936 4.533 4.751 4.776 4.635 5.864
7.414 13.196 11.940 11.832 11.675 11.811 11.812 12.964 11.885 11.751
11.355 10.788 10.527 10.541 10.745 10.723 10.651 10.930 10.639 10.595
Source code:
using System;
using System.Diagnostics;
static class ArrayPref
{
const string Format = "{0,7:0.000} ";
static void Main()
{
Jagged();
Multi();
Single();
}
static void Jagged()
{
const int dim = 100;
for(var passes = 0; passes < 10; passes++)
{
var timer = new Stopwatch();
timer.Start();
var jagged = new int[dim][][];
for(var i = 0; i < dim; i++)
{
jagged[i] = new int[dim][];
for(var j = 0; j < dim; j++)
{
jagged[i][j] = new int[dim];
for(var k = 0; k < dim; k++)
{
jagged[i][j][k] = i * j * k;
}
}
}
timer.Stop();
Console.Write(Format,
(double)timer.ElapsedTicks/TimeSpan.TicksPerMillisecond);
}
Console.WriteLine();
}
static void Multi()
{
const int dim = 100;
for(var passes = 0; passes < 10; passes++)
{
var timer = new Stopwatch();
timer.Start();
var multi = new int[dim,dim,dim];
for(var i = 0; i < dim; i++)
{
for(var j = 0; j < dim; j++)
{
for(var k = 0; k < dim; k++)
{
multi[i,j,k] = i * j * k;
}
}
}
timer.Stop();
Console.Write(Format,
(double)timer.ElapsedTicks/TimeSpan.TicksPerMillisecond);
}
Console.WriteLine();
}
static void Single()
{
const int dim = 100;
for(var passes = 0; passes < 10; passes++)
{
var timer = new Stopwatch();
timer.Start();
var single = new int[dim*dim*dim];
for(var i = 0; i < dim; i++)
{
for(var j = 0; j < dim; j++)
{
for(var k = 0; k < dim; k++)
{
single[i*dim*dim+j*dim+k] = i * j * k;
}
}
}
timer.Stop();
Console.Write(Format,
(double)timer.ElapsedTicks/TimeSpan.TicksPerMillisecond);
}
Console.WriteLine();
}
}

Simply put multidimensional arrays are similar to a table in DBMS.
Array of Array (jagged array) lets you have each element hold another array of the same type of variable length.
So, if you are sure that the structure of data looks like a table (fixed rows/columns), you can use a multi-dimensional array. Jagged array are fixed elements & each element can hold an array of variable length
E.g. Psuedocode:
int[,] data = new int[2,2];
data[0,0] = 1;
data[0,1] = 2;
data[1,0] = 3;
data[1,1] = 4;
Think of the above as a 2x2 table:
1 | 2
3 | 4
int[][] jagged = new int[3][];
jagged[0] = new int[4] { 1, 2, 3, 4 };
jagged[1] = new int[2] { 11, 12 };
jagged[2] = new int[3] { 21, 22, 23 };
Think of the above as each row having variable number of columns:
1 | 2 | 3 | 4
11 | 12
21 | 22 | 23

Update .NET 6:
With the release of .NET 6 I decided it was a good time to revisit this topic. I rewrote the test code for new .NET and ran it with the requirement of each part running at least a second. The benchmark was done on AMD Ryzen 5600x.
Results? It's complicated. It seems that Single array is the most performant for smaller and large arrays (< ~25x25x25 & > ~200x200x200) and Jagged arrays being fastest in between. Unfortunately it seems from my testing that multi-dimensional are by far the slowest option. At best performing twice as slow as the fastest option. But! It depends on what you need the arrays for because jagged arrays can take much longer to initialize on 50^3 cube the initialization was roughly 3 times longer than single dimensional. Multi-dimensional was only a little bit slower than single dimensional.
The conclusion? If you need fast code, benchmark it yourself on the machine it's going to run on. CPU architecture can complete change the relative performance of each method.
Numbers!
Method name Ticks/Iteration Scaled to the best
Array size 1x1x1 (10,000,000 iterations):
Jagged: 0.15 4.28
Single: 0.035 1
Multi-dimensional: 0.77 22
Array size 10x10x10 (25,000 iterations):
Jagged: 15 1.67
Single: 9 1
Multi-dimensional: 56 6.2
Array size 25x25x25 (25,000 iterations):
Jagged: 157 1.3
Single: 120 1
Multi-dimensional: 667 5.56
Array size 50x50x50 (10,000 iterations):
Jagged: 1,140 1
Single: 2,440 2.14
Multi-dimensional: 5,210 4.57
Array size 100x100x100 (10,000 iterations):
Jagged: 9,800 1
Single: 19,800 2
Multi-dimensional: 41,700 4.25
Array size 200x200x200 (1,000 iterations):
Jagged: 161,622 1
Single: 175,507 1.086
Multi-dimensional: 351,275 2.17
Array size 500x500x500 (100 iterations):
Jagged: 4,057.413 1.5
Single: 2,709,301 1
Multi-dimensional: 5,359,393 1.98
Don't trust me? Run it yourself and verify.
Note: the constant size seems to give jagged arrays an edge, but is not significant enough to change the order in my benchmarks. I have measured in some instance ~7% decrease in performance when using size from user input for jagged arrays, no difference for single arrays and very small difference (~1% or less) for multi-dimensional arrays. It is most prominent in the middle where jagged arrays take the lead.
using System.Diagnostics;
const string Format = "{0,7:0.000} ";
const int TotalPasses = 25000;
const int Size = 50;
Stopwatch timer = new();
var functionList = new List<Action> { Jagged, Single, SingleStandard, Multi };
Console.WriteLine("{0,5}{1,20}{2,20}{3,20}{4,20}", "Run", "Ticks", "ms", "Ticks/Instance", "ms/Instance");
foreach (var item in functionList)
{
var warmup = Test(item);
var run = Test(item);
Console.WriteLine($"{item.Method.Name}:");
PrintResult("warmup", warmup);
PrintResult("run", run);
Console.WriteLine();
}
static void PrintResult(string name, long ticks)
{
Console.WriteLine("{0,10}{1,20}{2,20}{3,20}{4,20}", name, ticks, string.Format(Format, (decimal)ticks / TimeSpan.TicksPerMillisecond), (decimal)ticks / TotalPasses, (decimal)ticks / TotalPasses / TimeSpan.TicksPerMillisecond);
}
long Test(Action func)
{
timer.Restart();
func();
timer.Stop();
return timer.ElapsedTicks;
}
static void Jagged()
{
for (var passes = 0; passes < TotalPasses; passes++)
{
var jagged = new int[Size][][];
for (var i = 0; i < Size; i++)
{
jagged[i] = new int[Size][];
for (var j = 0; j < Size; j++)
{
jagged[i][j] = new int[Size];
for (var k = 0; k < Size; k++)
{
jagged[i][j][k] = i * j * k;
}
}
}
}
}
static void Multi()
{
for (var passes = 0; passes < TotalPasses; passes++)
{
var multi = new int[Size, Size, Size];
for (var i = 0; i < Size; i++)
{
for (var j = 0; j < Size; j++)
{
for (var k = 0; k < Size; k++)
{
multi[i, j, k] = i * j * k;
}
}
}
}
}
static void Single()
{
for (var passes = 0; passes < TotalPasses; passes++)
{
var single = new int[Size * Size * Size];
for (var i = 0; i < Size; i++)
{
int iOffset = i * Size * Size;
for (var j = 0; j < Size; j++)
{
var jOffset = iOffset + j * Size;
for (var k = 0; k < Size; k++)
{
single[jOffset + k] = i * j * k;
}
}
}
}
}
static void SingleStandard()
{
for (var passes = 0; passes < TotalPasses; passes++)
{
var single = new int[Size * Size * Size];
for (var i = 0; i < Size; i++)
{
for (var j = 0; j < Size; j++)
{
for (var k = 0; k < Size; k++)
{
single[i * Size * Size + j * Size + k] = i * j * k;
}
}
}
}
}
Lesson learned: Always include CPU in benchmarks, because it makes a difference. Did it this time? I don't know but I suspect it might've.
Original answer:
I would like to update on this, because in .NET Core multi-dimensional arrays are faster than jagged arrays. I ran the tests from John Leidegren and these are the results on .NET Core 2.0 preview 2. I increased the dimension value to make any possible influences from background apps less visible.
Debug (code optimalization disabled)
Running jagged
187.232 200.585 219.927 227.765 225.334 222.745 224.036 222.396 219.912 222.737
Running multi-dimensional
130.732 151.398 131.763 129.740 129.572 159.948 145.464 131.930 133.117 129.342
Running single-dimensional
91.153 145.657 111.974 96.436 100.015 97.640 94.581 139.658 108.326 92.931
Release (code optimalization enabled)
Running jagged
108.503 95.409 128.187 121.877 119.295 118.201 102.321 116.393 125.499 116.459
Running multi-dimensional
62.292 60.627 60.611 60.883 61.167 60.923 62.083 60.932 61.444 62.974
Running single-dimensional
34.974 33.901 34.088 34.659 34.064 34.735 34.919 34.694 35.006 34.796
I looked into disassemblies and this is what I found
jagged[i][j][k] = i * j * k; needed 34 instructions to execute
multi[i, j, k] = i * j * k; needed 11 instructions to execute
single[i * dim * dim + j * dim + k] = i * j * k; needed 23 instructions to execute
I wasn't able to identify why single-dimensional arrays were still faster than multi-dimensional but my guess is that it has to do with some optimalization made on the CPU

Preface: This comment is intended to address the answer provided by okutane regarding the performance difference between jagged arrays and multidimensional ones.
The assertion that one type is slower than the other because of the method calls isn't correct. One is slower than the other because of more complicated bounds-checking algorithms. You can easily verify this by looking, not at the IL, but at the compiled assembly. For example, on my 4.5 install, accessing an element (via pointer in edx) stored in a two-dimensional array pointed to by ecx with indexes stored in eax and edx looks like so:
sub eax,[ecx+10]
cmp eax,[ecx+08]
jae oops //jump to throw out of bounds exception
sub edx,[ecx+14]
cmp edx,[ecx+0C]
jae oops //jump to throw out of bounds exception
imul eax,[ecx+0C]
add eax,edx
lea edx,[ecx+eax*4+18]
Here, you can see that there's no overhead from method calls. The bounds checking is just very convoluted thanks to the possibility of non-zero indexes, which is a functionality not on offer with jagged arrays. If we remove the sub, cmp, and jmps for the non-zero cases, the code pretty much resolves to (x*y_max+y)*sizeof(ptr)+sizeof(array_header). This calculation is about as fast (one multiply could be replaced by a shift, since that's the whole reason we choose bytes to be sized as powers of two bits) as anything else for random access to an element.
Another complication is that there are plenty of cases where a modern compiler will optimize away the nested bounds-checking for element access while iterating over a single-dimension array. The result is code that basically just advances an index pointer over the contiguous memory of the array. Naive iteration over multi-dimensional arrays generally involves an extra layer of nested logic, so a compiler is less likely to optimize the operation. So, even though the bounds-checking overhead of accessing a single element amortizes out to constant runtime with respect to array dimensions and sizes, a simple test-case to measure the difference may take many times longer to execute.

Multi-dimension arrays are (n-1)-dimension matrices.
So int[,] square = new int[2,2] is square matrix 2x2, int[,,] cube = new int [3,3,3] is a cube - square matrix 3x3. Proportionality is not required.
Jagged arrays are just array of arrays - an array where each cell contains an array.
So MDA are proportional, JD may be not! Each cell can contains an array of arbitrary length!

This might have been mentioned in the above answers but not explicitly: with jagged array you can use array[row] to refer a whole row of data, but this is not allowed for multi-d arrays.

In addition to the other answers, note that a multidimensional array is allocated as one big chunky object on the heap. This has some implications:
Some multidimensional arrays will get allocated on the Large Object Heap (LOH) where their equivalent jagged array counterparts would otherwise not have.
The GC will need to find a single contiguous free block of memory to allocate a multidimensional array, whereas a jagged array might be able to fill in gaps caused by heap fragmentation... this isn't usually an issue in .NET because of compaction, but the LOH doesn't get compacted by default (you have to ask for it, and you have to ask every time you want it).
You'll want to look into <gcAllowVeryLargeObjects> for multidimensional arrays way before the issue will ever come up if you only ever use jagged arrays.

I thought I'd chime in here from the future with some performance results from .NET 5, seen as that will be the platform which everyone uses from now on.
These are the same tests that John Leidegren used (in 2009).
My results (.NET 5.0.1):
Debug:
(Jagged)
5.616 4.719 4.778 5.524 4.559 4.508 5.913 6.107 5.839 5.270
(Multi)
6.336 7.477 6.124 5.817 6.516 7.098 5.272 6.091 25.034 6.023
(Single)
4.688 3.494 4.425 6.176 4.472 4.347 4.976 4.754 3.591 4.403
Release(code optimizations on):
(Jagged)
2.614 2.108 3.541 3.065 2.172 2.936 1.681 1.724 2.622 1.708
(Multi)
3.371 4.690 4.502 4.153 3.651 3.637 3.580 3.854 3.841 3.802
(Single)
1.934 2.102 2.246 2.061 1.941 1.900 2.172 2.103 1.911 1.911
Ran on a a 6 core 3.7GHz AMD Ryzen 1600 machine.
It looks as though the performance ratio is still roughly the same. I'd say unless you're really optimizing hard, just use multi-dimensional arrays as the syntax is slightly easier to use.

Jagged arrays are arrays of arrays or arrays in which each row contains an array of its own.
These arrays can have lengths different than those in the other rows.
Declaration and Allocation an Array of Arrays
The only difference in the declaration of the jagged arrays compared to the regular multidimensional array is that we do not have just one pair of brackets. With the jagged arrays, we have a pair of brackets per dimension. We allocate them this way:
int[][] exampleJaggedArray;
jaggedArray = new int[2][];
jaggedArray[0] = new int[5];
jaggedArray[1] = new int[3];
The Initializing an array of arrays
int[][] exampleJaggedArray = {
new int[] {5, 7, 2},
new int[] {10, 20, 40},
new int[] {3, 25}
};
Memory Allocation
Jagged arrays are an aggregation of references. A jagged array does not directly contain any arrays, but rather has elements pointing to them. The size is unknown and that is why CLR just keeps references to the internal arrays. After we allocate memory for one array-element of the jagged array, then the reference starts pointing to the newly created block in the dynamic memory.
The variable exampleJaggedArray is stored in the execution stack of the program and points to a block in the dynamic memory, which contains a sequence of three references to other three blocks in memory; each of them contains an array of integer numbers – the elements of the jagged array:

I am parsing .il files generated by ildasm to build a database of assemnblies, classes, methods, and stored procedures for use doing a conversion. I came across the following, which broke my parsing.
.method private hidebysig instance uint32[0...,0...]
GenerateWorkingKey(uint8[] key,
bool forEncryption) cil managed
The book Expert .NET 2.0 IL Assembler, by Serge Lidin, Apress, published 2006, Chapter 8, Primitive Types and Signatures, pp. 149-150 explains.
<type>[] is termed a Vector of <type>,
<type>[<bounds> [<bounds>**] ] is termed an array of <type>
** means may be repeated, [ ] means optional.
Examples: Let <type> = int32.
1) int32[...,...] is a two-dimensional array of undefined lower bounds and sizes
2) int32[2...5] is a one-dimensional array of lower bound 2 and size 4.
3) int32[0...,0...] is a two-dimensional array of lower bounds 0 and undefined size.
Tom

Using a test based on the one by John Leidegren, I benchmarked the result using .NET 4.7.2, which is the relevant version for my purposes and thought I could share. I originally started with this comment in the dotnet core GitHub repository.
It appears that the performance varies greatly as the array size changes, at least on my setup, 1 processor xeon with 4physical 8logical.
w = initialize an array, and put int i * j in it.
wr = do w, then in another loop set int x to [i,j]
As array size grows, multidimensional appears to outperform.
Size
rw
Method
Mean
Error
StdDev
Gen 0/1k Op
Gen 1/1k Op
Gen 2/1k Op
Allocated Memory/Op
1800*500
w
Jagged
2.445 ms
0.0959 ms
0.1405 ms
578.1250
281.2500
85.9375
3.46 MB
1800*500
w
Multi
3.079 ms
0.2419 ms
0.3621 ms
269.5313
269.5313
269.5313
3.43 MB
2000*4000
w
Jagged
50.29 ms
3.262 ms
4.882 ms
5937.5000
3375.0000
937.5000
30.62 MB
2000*4000
w
Multi
26.34 ms
1.797 ms
2.690 ms
218.7500
218.7500
218.7500
30.52 MB
2000*4000
wr
Jagged
55.30 ms
3.066 ms
4.589 ms
5937.5000
3375.0000
937.5000
30.62 MB
2000*4000
wr
Multi
32.23 ms
2.798 ms
4.187 ms
285.7143
285.7143
285.7143
30.52 MB
1000*2000
wr
Jagged
11.18 ms
0.5397 ms
0.8078 ms
1437.5000
578.1250
234.3750
7.69 MB
1000*2000
wr
Multi
6.622 ms
0.3238 ms
0.4847 ms
210.9375
210.9375
210.9375
7.63 MB
Update: last two tests with double[,] instead of int[,]. The difference appears significant considering the errors. With int, ratio of mean for jagged vs md is between 1.53x and 1.86x, with doubles it is 1.88x and 2.42x.
Size
rw
Method
Mean
Error
StdDev
Gen 0/1k Op
Gen 1/1k Op
Gen 2/1k Op
Allocated Memory/Op
1000*2000
wr
Jagged
26.83 ms
1.221 ms
1.790 ms
3062.5000
1531.2500
531.2500
15.31 MB
1000*2000
wr
Multi
12.61 ms
1.018 ms
1.524 ms
156.2500
156.2500
156.2500
15.26 MB

Related

Vectorize ND-Array to 1D-Array as fast as possible

I'm trying to vectorize a n-dimensional array as 1-dimensional array in C# to later ease working using linear indexing (this whatever the type of the elements).
So far I was using Buffer.BlockCopy to do that (and even reshaping from n-dimensions to m-dimensions as long as the number of elements was not changing) but unfortunately I came across having to reshape arrays whose elements are not primitive types (double, single, int) and in this case Buffer.BlockCopy does not work (example array of string or whatever other non primitive type).
Currently the solution I have is to make special-case for non-primitive types:
/// <summary>Vectorize ND-array</summary>
/// <param name="arrayNd">ND-Array to vectorize.</param>
/// <returns>Surface copy as 1D array.</returns>
public static Array Vectorize(Array arrayNd)
{
// Check arguments
if (arrayNd == null) { return null; }
var elementCount = arrayNd.Length;
// Create 1D array
var tarray = arrayNd.GetType();
var telem = tarray.GetElementType();
var array1D = Array.CreateInstance(telem, elementCount);
// Surface copy
if (telem.IsPrimitive)
{
// Block copy only works for array whose elements are primitive types (double, single, ...)
var numberOfBytes = Buffer.ByteLength(arrayNd);
Buffer.BlockCopy(arrayNd, 0, array1D, 0, numberOfBytes);
}
else
{
// Slow version for other element types
// NB: arrayNd.GetValue(...) does not support linear indexing so need to compute indices for each dimension (very slow !!)
var indices = new int[arrayNd.Rank];
for (var i = 0; i < elementCount; i++)
{
var idx = i;
for (var d = arrayNd.Rank - 1; d >= 0; d--)
{
var l = arrayNd.GetLength(d);
indices[d] = idx % l;
idx /= l;
}
array1D.SetValue(arrayNd.GetValue(indices), i);
}
}
// Return as 1D
return array1D;
}
So this works now all types:
var double1D = Vectorize(new double[3, 2, 5]); // Fast BlockCopy
var string1D = Vectorize(new string[3, 2, 5]); // Slow solution
I already have an NEnumerator class of my own to speed up computing indices (instead of using modulo as above) but maybe there is really fast way for just making this sort of "surface memcpy" ?
NB1: I'd like to avoid unsafe code but if it's the only way ...
NB2: I really want to work with System.Array (eventually I'll later do a bunch of T[] Vectorize(T[,,,,] array) overloads but that's not the issue)
In my experience, Multidimensional arrays are kind of a pain to work with, in large part since it is so difficult to access the backing data. As far as I know there is no direct way to just copy all the elements for arbitrary types.
Because of this I tend to prefer a custom type for my 2D types that uses a linear array as backing storage, and index like myArray[y * width + x]. With this model the whole exercise becomes a no-op, and you can get a pointer to pass to native code, it works better with serialization etc.
For 3D/4D arrays you could use the same mode, but it seems like the best option for performance is allocate slices independently, i.e. myArray[z][y * width + x], at least for large arrays. I have not worked with 4D arrays, but in general, I would avoid multidimensional arrays if performance is a concern. There might also be libraries out there that might suit your needs, but I'm not aware of any specific one.
However, looking at your code I would expect there to be some possible improvements. You are currently doing N calls to GetLength, modulus & divisions for each element. So I would expect something like this to be a bit faster:
public static Array MultidimensionalToLinear(Array arr)
{
var rank = arr.Rank;
var lengths = new int[rank];
for (int i = 0; i < rank; i++)
{
lengths[i] = arr.GetLength(i);
}
var linearLength = arr.Length;
var result = Array.CreateInstance(arr.GetType().GetElementType(), linearLength);
var index = new int[rank];
var linearIndex = 0;
CopyRecursive(0, index, result, ref linearIndex);
void CopyRecursive(int rank, int[] index, Array result, ref int linearIndex)
{
var lastIndex = index.Length - 1;
if (rank == lastIndex)
{
for (int i = 0; i < lengths[lastIndex]; i++)
{
index[lastIndex] = i;
result.SetValue(arr.GetValue(index), linearIndex);
linearIndex++;
}
}
else
{
for (int i = 0; i < lengths[rank]; i++)
{
index[rank] = i;
CopyRecursive(rank +1, index, result, ref linearIndex);
}
}
}
return result;
}
However, when measuring it seem like the performance improvement is fairly small. Probably due the code in GetValue dominating the runtime.

2D Array vs Array of Arrays in tight loop performance C#

I had a look and couldn't see anything quite answering my question.
I'm not exactly the best at creating accurate 'real life' tests, so i'm not sure if that's the problem here. Basically I want to create a few simple neural networks to create something to the effect of Gridworld. Performance of these neural networks will be critical and i dont want the hidden layer to be a bottleneck as much as possible.
I would rather use more memory and be faster, so I opted to use arrays instead of lists (due to lists having an extra bounds check over arrays). The arrays aren't always full, but because the if statement (check if the element is null) is the same until the end, it can be predicted and there is no performance drop from that at all.
My question comes from how I store the data for the network to process. I figured due to 2D arrays storing all the data together it would be better cache wise and would run faster. But from my mock up test that an array of arrays performs much better in this scenario.
Some Code:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
for (int j = 0; j < testArray[i].Length; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < data[i].weights.Length; k++) {
inputTotal += testArray[i][k];
}
}
}
}
private void Run2DArrayTest(float[,] testArray, Data[] data, int maxI, int maxJ)
{
for (int i = 0; i < maxI; i++) {
for (int j = 0; j < maxJ; j++) {
var inputTotal = data[i].bias;
for (int k = 0; k < maxJ; k++) {
inputTotal += testArray[i, k];
}
}
}
}
These are the two functions that are timed. Each 'creature' has its own network (The first for loop), each network has hidden nodes (The second for loop) and i need to find the sum of the weights for each input (The third loop). In my test i stripped it so that it's not really what i am doing in my actual code, but the same amount of loops happen (The data variable would have it's own 2D array, but i didn't want to possibly skew the results). From this i was trying to get a feel for which one is faster, and to my surprise the array of arrays was.
Code to start the tests:
// Array of Array test
Stopwatch timer = Stopwatch.StartNew();
RunArrayOfArrayTest(arrayOfArrays, dataArrays);
timer.Stop();
Console.WriteLine("Array of Arrays finished in: " + timer.ElapsedTicks);
// 2D Array test
timer = Stopwatch.StartNew();
Run2DArrayTest(array2D, dataArrays, NumberOfNetworks, NumberOfInputNeurons);
timer.Stop();
Console.WriteLine("2D Array finished in: " + timer.ElapsedTicks);
Just wanted to show how i was testing it. The results from this in release mode give me values like:
Array of Arrays finished in: 8972
2D Array finished in: 16376
Can someone explain to me what i'm doing wrong? Why is an array of arrays faster in this situation by so much? Isn't a 2D array all stored together, meaning it would be more cache friendly?
Note i really do need this to be fast as it needs to sum up hundreds of thousands - millions of numbers per frame, and like i said i don't want this is be a problem. I know this can be multi threaded in the future quite easily because each network is completely separate and even each node is completely separate.
Last question i suppose, would something like this be possible to run on the GPU instead? I figure a GPU would not struggle to have much larger amounts of networks with much larger numbers of input/hidden neurons.
In the CLR, there are two different types of array:
Vectors, which are zero-based, single-dimensional arrays
Arrays, which can have non-zero bases and multiple dimensions
Your "array of arrays" is a "vector of vectors" in CLR terms.
Vectors are significantly faster than arrays, basically. It's possible that arrays could be optimized further in later CLR versions, but I doubt that there'll get the same amount of love as vectors, as they're so relatively rarely used. There's not a lot you can do to make CLR arrays faster. As you say, they'll be more cache friendly, but they have this CLR penalty.
You can improve your array-of-arrays code already, however, by only performing the first indexing operation once per row:
private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
for (int i = 0; i < testArray.Length; i++) {
// These don't change in the loop below, so extract them
var row = testArray[i];
var inputTotal = data[i].bias;
var weightLength = data[i].weights.Length;
for (int j = 0; j < row.Length; j++) {
for (int k = 0; k < weightLength; k++) {
inputTotal += row[k];
}
}
}
}
If you want to get the cache friendliness and still use a vector, you could have a single float[] and perform the indexing yourself... but I'd probably start off with the array-of-arrays approach.

A workaround for a big multidimensional array (Jagged Array) C#?

I'm trying to initialize an array in three dimension to load a voxel world.
The total size of the map should be (2048/1024/2048). I tried to initialize an jagged array of "int" but I throw a memory exception. What is the size limit?
Size of my table: 2048 * 1024 * 2048 = 4'191'893'824
Anyone know there a way around this problem?
// System.OutOfMemoryException here !
int[][][] matrice = CreateJaggedArray<int[][][]>(2048,1024,2048);
// if i try normal Initialization I also throws the exception
int[, ,] matrice = new int[2048,1024,2048];
static T CreateJaggedArray<T>(params int[] lengths)
{
return (T)InitializeJaggedArray(typeof(T).GetElementType(), 0, lengths);
}
static object InitializeJaggedArray(Type type, int index, int[] lengths)
{
Array array = Array.CreateInstance(type, lengths[index]);
Type elementType = type.GetElementType();
if (elementType != null)
{
for (int i = 0; i < lengths[index]; i++)
{
array.SetValue(
InitializeJaggedArray(elementType, index + 1, lengths), i);
}
}
return array;
}
The maximum size of a single object in C# is 2GB. Since you are creating a multi-dimensional array rather than a jagged array (despite the name of your method) it is a single object that needs to contain all of those items, not several. If you actually used a jagged array then you wouldn't have a single item with all of that data (even though the total memory footprint would be a tad larger, not smaller, it's just spread out more).
Thank you so much to all the staff who tried to help me in understanding and solving my problem.
I tried several solution to be able to load a lot of data and stored in a table.
After two days, here are my tests and finally the solution which can store 4'191'893'824 entry into one array
I add my final solution, hoping someone could help
the goal
I recall the goal: Initialize an integer array [2048/1024/2048] for storing 4'191'893'824 data
Test 1: with JaggedArray method (failure)
system out of memory exception thrown
/* ******************** */
/* Jagged Array method */
/* ******************** */
// allocate the first dimension;
bigData = new int[2048][][];
for (int x = 0; x < 2048; x++)
{
// allocate the second dimension;
bigData[x] = new int[1024][];
for (int y = 0; y < 1024; y++)
{
// the last dimension allocation
bigData[x][y] = new int[2048];
}
}
Test 2: with List method (failure)
system out of memory exception thrown (divide the big array into several small array .. Does not work because "List <>" allows a maximum of "2GB" Ram allocution like a simple array unfortunately.)
/* ******************** */
/* List method */
/* ******************** */
List<int[,,]> bigData = new List<int[,,]>(512);
for (int a = 0; a < 512; a++)
{
bigData.Add(new int[256, 128, 256]);
}
Test 3: with MemoryMappedFile (Solution)
I finally finally found the solution!
Use the class "Memory Mapped File" contains the contents of a file in virtual memory.
MemoryMappedFile MSDN
Use with custom class that I found on codeproject here. The initialization is long but it works well!
/* ************************ */
/* MemoryMappedFile method */
/* ************************ */
string path = AppDomain.CurrentDomain.BaseDirectory;
var myList = new GenericMemoryMappedArray<int>(2048L*1024L*2048L, path);
using (myList)
{
myList.AutoGrow = false;
/*
for (int a = 0; a < (2048L * 1024L * 2048L); a++)
{
myList[a] = a;
}
*/
myList[12456] = 8;
myList[1939848234] = 1;
// etc...
}
From the MSDN documentation on Arrays (emphasis added)
By default, the maximum size of an Array is 2 gigabytes (GB). In a
64-bit environment, you can avoid the size restriction by setting the
enabled attribute of the gcAllowVeryLargeObjects configuration element
to true in the run-time environment. However, the array will still be
limited to a total of 4 billion elements, and to a maximum index of
0X7FEFFFFF in any given dimension (0X7FFFFFC7 for byte arrays and
arrays of single-byte structures).
So despite the above answers, even if you set the flag to allow a larger object size, the array is still limited to the 32bit limit of the number of elements.
EDIT: You'll likely have to redesign to eliminate the need for a multidimensional array as you're currently using it (as others have suggested, there are a few ways to do this between using actual jagged arrays, or some other collection of dimensions). Given the scale of the number of elements, it may be best to use a design that dynamically allocates objects/memory as used instead of arrays that have to pre-allocate it. (unless you don't mind using many gigabytes of memory) EDITx2: That is, perhaps you can define data structures that define filled content rather than defining every possible voxel in the world, even the "empty" ones. (I'm assuming the vast majority of voxels are "empty" rather than "filled")
EDIT: Although not trivial, especially if most of the space is considered "empty", then your best bet would be to introduce some sort of spatial tree that will let you efficiently query your world to see what objects are in a particular area. For example: Octrees (as Eric suggested) or RTrees
Creating this object as described, either as a standard array or as a jagged array, is going to destroy the locality of reference that allows your CPU to be performant. I recommend you use a structure like this instead:
class BigArray
{
ArrayCell[,,] arrayCell = new ArrayCell[32,16,32];
public int this[int i, int j, int k]
{
get { return (arrayCell[i/64, j/64, k/64])[i%64, j%64, k%16]; }
}
}
class ArrayCell
{
int[,,] cell = new int[64,64,64];
public int this[int i, int j, int k]
{
get { return cell[i,j,k]; }
}
}

What's the best way to implement an unfixed multi-deminsional array in C#.NET?

For example: an array of varying-length arrays of integers.
In C++, we are used to doing things like:
int * * TwoDimAry = new int * [n] ;
for ( int i ( 0 ) ; i < n ; i ++ )
{
TwoDimAry[i] = new int [i + n] ;
}
In this case, if n == 3 then the result would be an array of three pointers to arrays of integers, and would appear like this:
http://img263.imageshack.us/img263/4149/multidimarray.png
Of course, .NET arrays are managed collections, so you don't have to deal with the manual allocation/deletion.
But declaring:
int[][] TwoDimAry ;
... in C# does not appear to have the same effect - namely, you have to innitialize ALL of the sub-arrays at the same time, and they have to be the same length.
I need my sub-arrays to be independent of each-other, as they are in native C++.
What's the best way to implement this using managed collections? Are there any drawbacks I should be aware of?
Like C++, you need to initialize every subarray in an int[][].
However, they don't need to have the same length. (That's why it's called a jagged array)
For example:
int[][] jagged1 = new int[][] { new int[1], new int[2], new int[3] };
Your C++ code can be translated directly to C#:
int[][] TwoDimAry = new int[n][];
for(int i = 0; i < n; i++) {
TwoDimAry[i] = new int[i + n];
}
Here is an example with a jagged array initialized with 1, 2, 3, .. elements for each row
int N = 20;
int[][] array = new int[N][]; // First index is rows, second is columns
for(int i=0; i < N; i++)
{
array[i] = new int[i+1]; // Initialize i-th row with 'i' columns
for( int j = 0; j <= i; j++)
{
array[i][j] = N*j+i; // Set a value for each column in the row
}
}
I have use this enough to know that there aren't many drawbacks overall. Hybrid appraches with List<int[]> or List<int>[] also work.
In .Net, most of the time you don't want to use arrays this way at all. This is because in .Net, arrays are thought of as a different animal from a collection. Managed, yes. Collection? Well, maybe, but it confuses terms because that means something special. If you want a collection (hint: most of the time you do), look in the Systems.Collections namespace, particularly Systems.Collections.Generic. It sounds like you really want either a List<List<int>> or a List<int[]>.

Accessing Dictionary and Multi-Dimension Array is Slow

I found that accessing Dictionary and Multi-Dimension array can be slow-- quite a puzzle since access time for dictionary and array is O(1).
This is my code
public struct StateSpace
{
public double q;
public double v;
public double a;
}
public class AccessTest
{
public Dictionary<int, Dictionary<double,StateSpace>> ModeStateSpace;
public double[,] eigenVectors;
public void AccessJob(int n, double times)
{
var sumDisplacement = new double[6];
for(int i=0; i< n; i++)
{
var modeDisplacement = ModeStateSpace[i][times].q; //takes 5.81 sec
for(int j=0; j<6; j++)
{
var eigenVector = eigenVectors[i, j]; //takes 5.69 sec
sumDisplacement[i] += eigenVector*modeDisplacement ; //takes 1.06 sec
}
}
}
}
Notice the interesting part? The manipulation takes ~ 1 sec, but the accessing of dictionary and multi dimensional array takes ~5 sec!! Any idea why this is the case?
The n is not important here, and so is the absolute magnitude of the time needed for each operation. What is important is the ratio between the arithmetic operation and the dictionary lookup time.
Edit: I'm using Ants Profiler to do the profiling.
Note: I just simplified my actual code into something like this; the above code snippet has not been tested yet, but I'm fairly confident that I capture the gist of the problem with the above code snippet.
It is a known fact that jagged arrays are faster than multidimensional arrays in .NET (at least on windows):
Why are multi-dimensional arrays in .NET slower than normal arrays?
What are the differences between a multidimensional array and an array of arrays in C#?
so maybe try a jagged array instead

Categories

Resources