Find intersection of two multi-dimensional Arrays in C# 4.0

Find intersection of two multi-dimensional Arrays in C# 4.0 - c#

Trying to find a solution to my ranking problem.
Basically I have two multi-dimensional double[,] arrays. Both containing rankings for certain scenarios, so [rank number, scenario number]. More than one scenario can have the same rank.
I want to generate a third multi-dimensional array, taking the intersections of the previous two multi-dimensional arrays to provide a joint ranking.
Does anyone have an idea how I can do this in C#?
Many thanks for any advice or help you can provide!
Edit:
Thank you for all the responses, sorry I should have included an example.
Here it is:
Array One:
[{0,4},{1,0},{1,2},{2,1},{3,5},{4,3}]
Array Two:
[{0,1},{0,4},{1,0},{1,2},{3,5},{4,3}]
Required Result:
[{0,4},{1,0},{1,2},{1,1},{2,5},{3,3}]

Here's some sample code that makes a bunch of assumptions but might be something like what you are looking for. I've added a few comments as well:
static double[,] Intersect(double[,] a1, double[,] a2)
{
// Assumptions:
// a1 and a2 are two-dimensional arrays of the same size
// An element in the array matches if and only if its value is found in the same location in both arrays
// result will contain not-a-number (NaN) for non-matches
double[,] result = new double[a1.GetLength(0), a1.GetLength(1)];
for (int i = 0; i < a1.GetLength(0); i++)
{
for (int j = 0; j < a1.GetLength(1); j++)
{
if (a1[i, j] == a2[i, j])
{
result[i, j] = a1[i, j];
}
else
{
result[i, j] = double.NaN;
}
}
}
return result;
}
For the most part, finding the intersection of multiple dimensional arrays will involve iterating over the elements in each of the dimensions in the arrays. If the indices of the array are not part of the match criteria (my second assumption in my code is removed), you would have to walk each dimension in each array - which increases the run-time of the algorithm (in this case, from O(n^2) to O(n^4).
If you care enough about run-time, I believe array matching is one of the typical examples of dynamic programming (DP) optimization; which you can read up on at your leisure.
I'm not sure how you wanted your results...you could probably return a flat collection of results that can be indexed by a pair, which would potentially save a lot of space if the expected result set is typically small. I went with a third fixed-sized array because it was the easiest thing to do.
Lastly, I'll mention that I don't see a keen C# way of doing this using IEnumerable, LINQ, or something like that. Someone more C# knowledgeable than I can chime in anytime now....

Given the additional information, I'd argue that you aren't actually working with multidimensional arrays, but instead are working with a collection of pairs. The pair is a pair of doubles. I think the following should work nicely:
public class Pair : IEquatable<Pair>
{
public double Rank;
public double Scenario;
public bool Equals(Pair p)
{
return Rank == p.Rank && Scenario == p.Scenario;
}
public override int GetHashCode()
{
int hashRank= Rank.GetHashCode();
int hashScenario = Scenario.GetHashCode();
return hashRank ^ hashScenario;
}
}
You can then use the Intersect operator on IEnumerable:
List<Pair> one = new List<Pair>();
List<Pair> two = new List<Pair>();
// ... populate the lists
List<Pair> result = one.Intersect(two).ToList();
Check out the following msdn article on Enumerable.Intersect() for more information:
http://msdn.microsoft.com/en-us/library/bb910215%28v=vs.90%29.aspx

Related

how to create multidimensional arrays in C# without knowing the size

I need to understand on a practical level how to create a matrix[][] in C# without knowing the size.
And consequently also how to modify it (delete elements depending on a search key).
I have an example loop. Two random string variables. Then I am no longer able to continue....
private static Random random = new Random();
for (int i=0; i<unKnown; i++){
var firstVar = RandomString(5);
var secondVar = RandomString(20);
//Matrix[][]
}
public static string RandomString(int length){
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
Thank you

Arrays are fixed size. They do not adjust their size automatically. E.g. the size is defined when creating the array with
string[] array = new string[10];
If your array is 2 dimensional (10x10) and you delete the value at (1:1) the Array still remains 10x10 but the field at 1:1 is null now.
If you need a solution that adjusts its size you might want to look into Lists.
Otherwise, I advise you to read the documentation.

It really depend on what you want to do.
If you want a 2D array of values you can use multidimensional arrays. This supports arbitrary dimensions, but for more dimensions data sizes tend to go up and other solutions might be preferable:
var matrix = new double[4, 2];
If you want to do math you might want to use a library like Math.Net with specialized matrix types:
var matrix = Matrix<double>.Build.Dense(4, 2);
If you want to do computer graphics you likely want to use a specialized library, like system.Numerics.Matrix4x4
var matrix = new Matrix4x4();
It is also not particularly difficult to create your own matrix class that wraps a regular array. This has the benefit that interoperability is often easier, since most framework and tools accept accept pointers or 1D arrays, while few can handle a multidimensional array. Indexing can be done like:
public class MyMatrix<T>
{
public int Width { get; }
public T[] Data { get; }
public T this[int x, int y]
{
get => Data[y * Width + x];
set => Data[y * Width + x] = value;
}
}
There is also jagged arrays, but there is no guarantee that these will be "square", so they are probably not appropriate if you want a "matrix".
In all cases you will need to loop over the matrix and check each element if you want to do any kind of replacement. Some alternatives require separate loops for width/height, while some allow for a single loop.

I'm not sure if you want a matrix or an array.
Matrix would be like
string[,] matrix = new string[10, 10];
and array would be like
string[] array = new string[10];
You can access the array with array[i] and matrix with matrix[i, j]
You could also use
List<List<string>> matrix = new List<List<string>>(); which may be more convenient to work with and can also be access with indexers. For example
matrix[i][j] = "bob";
matrix[i].RemoveAt(j);
Given the problem you have submitted maybe just a List<string> would work for you.

C# How to paste 2D array[,] onto second 2D array[,] on specified indexes

is there any better (less complex) way to paste the smaller array[,] to the bigger array[,] than looping through them? My code:
private void PasteRoomIntoTheMap(GameObject[,] room, (int, int) positionOnTheMap)
{
(int, int) roomSize = (room.GetLength(0), room.GetLength(1));
int roomsXAxesDimention = 0;
int roomsYAxesDimention = 0;
for (int i = positionOnTheMap.Item1; i <= positionOnTheMap.Item1 + roomSize.Item1; i++)
{
for (int j = positionOnTheMap.Item2; j <= positionOnTheMap.Item2 + roomSize.Item2; j++)
{
Map[i, j] = room[roomsXAxesDimention, roomsYAxesDimention];
roomsYAxesDimention++;
}
roomsXAxesDimention++;
}
}
(I didn't run it yet. There might be some errors but I hope that you will understand this method)
The code that I would love to have:
Map [5...15, 2...5] = room[0...room.GetLength(0), 0...room.GetLength(1)]

Short answer: No
Longer answer:
Multidimensional arrays are really a wrapper around a 1D array, with some special code to handle indices etc. It does however lack some useful features.
You could perhaps improve your example code a bit by copying entire columns or rows using Array.Copy or blockCopy instead of processing item by item. This might help performance if you have very many items that need copying, but is unlikely to make any difference if the sizes are small.
You could probably provide the syntax you desire by creating something like a ArraySlice2D class that reference the original 2D array as well as the region you want to copy. But you probably need to either create your own wrapper to provide index operators, or create extension methods to do it. Perhaps something like
public class My2DArray<T>{
private T[,] array;
public ArraySlice2D<T> this[Range x, Range y]{
get => new ArraySlice2D<T>(array, x, y);
set {
// Copy values from value.RangeX/Y to x/y ranges
}
}
You might also consider writing your own 2D array class that wraps a 1D array. I find that this often makes interoperability easier since most systems can handle 1D arrays, but few handle multidimensional arrays without conversion.

I am not aware of any shortcut for that and there might be to many slightly distinct usecase so that the most apropriate solution would be to implement your own functionality.
It is advisable however to add a size check to your method so you dont end up indexing nonexisting areas of the big array:
private bool ReplaceWithSubArray(int[,] array, int[,] subarray, (int x,int y) indices)
{
if (array.GetLength(0) < subarray.GetLength(0) + indices.x ||
array.GetLength(1) < subarray.GetLength(1) + indices.y)
{
// 'array' to small
return false;
}
for (int x = 0; x <= subarray.GetLength(0); x++)
{
for (int y = 0; y <= subarray.GetLength(1); y++)
{
array[x + indices.x, y + indices.y] = subarray[x, y];
}
}
return true;
}

A workaround for a big multidimensional array (Jagged Array) C#?

I'm trying to initialize an array in three dimension to load a voxel world.
The total size of the map should be (2048/1024/2048). I tried to initialize an jagged array of "int" but I throw a memory exception. What is the size limit?
Size of my table: 2048 * 1024 * 2048 = 4'191'893'824
Anyone know there a way around this problem?
// System.OutOfMemoryException here !
int[][][] matrice = CreateJaggedArray<int[][][]>(2048,1024,2048);
// if i try normal Initialization I also throws the exception
int[, ,] matrice = new int[2048,1024,2048];
static T CreateJaggedArray<T>(params int[] lengths)
{
return (T)InitializeJaggedArray(typeof(T).GetElementType(), 0, lengths);
}
static object InitializeJaggedArray(Type type, int index, int[] lengths)
{
Array array = Array.CreateInstance(type, lengths[index]);
Type elementType = type.GetElementType();
if (elementType != null)
{
for (int i = 0; i < lengths[index]; i++)
{
array.SetValue(
InitializeJaggedArray(elementType, index + 1, lengths), i);
}
}
return array;
}

The maximum size of a single object in C# is 2GB. Since you are creating a multi-dimensional array rather than a jagged array (despite the name of your method) it is a single object that needs to contain all of those items, not several. If you actually used a jagged array then you wouldn't have a single item with all of that data (even though the total memory footprint would be a tad larger, not smaller, it's just spread out more).

Thank you so much to all the staff who tried to help me in understanding and solving my problem.
I tried several solution to be able to load a lot of data and stored in a table.
After two days, here are my tests and finally the solution which can store 4'191'893'824 entry into one array
I add my final solution, hoping someone could help
the goal
I recall the goal: Initialize an integer array [2048/1024/2048] for storing 4'191'893'824 data
Test 1: with JaggedArray method (failure)
system out of memory exception thrown
/* ******************** */
/* Jagged Array method */
/* ******************** */
// allocate the first dimension;
bigData = new int[2048][][];
for (int x = 0; x < 2048; x++)
{
// allocate the second dimension;
bigData[x] = new int[1024][];
for (int y = 0; y < 1024; y++)
{
// the last dimension allocation
bigData[x][y] = new int[2048];
}
}
Test 2: with List method (failure)
system out of memory exception thrown (divide the big array into several small array .. Does not work because "List <>" allows a maximum of "2GB" Ram allocution like a simple array unfortunately.)
/* ******************** */
/* List method */
/* ******************** */
List<int[,,]> bigData = new List<int[,,]>(512);
for (int a = 0; a < 512; a++)
{
bigData.Add(new int[256, 128, 256]);
}
Test 3: with MemoryMappedFile (Solution)
I finally finally found the solution!
Use the class "Memory Mapped File" contains the contents of a file in virtual memory.
MemoryMappedFile MSDN
Use with custom class that I found on codeproject here. The initialization is long but it works well!
/* ************************ */
/* MemoryMappedFile method */
/* ************************ */
string path = AppDomain.CurrentDomain.BaseDirectory;
var myList = new GenericMemoryMappedArray<int>(2048L*1024L*2048L, path);
using (myList)
{
myList.AutoGrow = false;
/*
for (int a = 0; a < (2048L * 1024L * 2048L); a++)
{
myList[a] = a;
}
*/
myList[12456] = 8;
myList[1939848234] = 1;
// etc...
}

From the MSDN documentation on Arrays (emphasis added)
By default, the maximum size of an Array is 2 gigabytes (GB). In a
64-bit environment, you can avoid the size restriction by setting the
enabled attribute of the gcAllowVeryLargeObjects configuration element
to true in the run-time environment. However, the array will still be
limited to a total of 4 billion elements, and to a maximum index of
0X7FEFFFFF in any given dimension (0X7FFFFFC7 for byte arrays and
arrays of single-byte structures).
So despite the above answers, even if you set the flag to allow a larger object size, the array is still limited to the 32bit limit of the number of elements.
EDIT: You'll likely have to redesign to eliminate the need for a multidimensional array as you're currently using it (as others have suggested, there are a few ways to do this between using actual jagged arrays, or some other collection of dimensions). Given the scale of the number of elements, it may be best to use a design that dynamically allocates objects/memory as used instead of arrays that have to pre-allocate it. (unless you don't mind using many gigabytes of memory) EDITx2: That is, perhaps you can define data structures that define filled content rather than defining every possible voxel in the world, even the "empty" ones. (I'm assuming the vast majority of voxels are "empty" rather than "filled")
EDIT: Although not trivial, especially if most of the space is considered "empty", then your best bet would be to introduce some sort of spatial tree that will let you efficiently query your world to see what objects are in a particular area. For example: Octrees (as Eric suggested) or RTrees

Creating this object as described, either as a standard array or as a jagged array, is going to destroy the locality of reference that allows your CPU to be performant. I recommend you use a structure like this instead:
class BigArray
{
ArrayCell[,,] arrayCell = new ArrayCell[32,16,32];
public int this[int i, int j, int k]
{
get { return (arrayCell[i/64, j/64, k/64])[i%64, j%64, k%16]; }
}
}
class ArrayCell
{
int[,,] cell = new int[64,64,64];
public int this[int i, int j, int k]
{
get { return cell[i,j,k]; }
}
}

Most efficient sorting algorithm for sorted sub-sequences

I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.

Just merge the sequences. You do not have to sort them again.

There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.

You just have to merge your sequences like in a merge sort.
And this is parallelizable:
merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
…
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
if (j < size_seq1 && seq1[j] < seq2[k])
{
merged_seq[i] = seq1[j];
j++;
}
else
{
merged_seq[i] = seq2[k];
k++;
}
}

Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
Merge sequences in each group. Thus you will get k/2 new groups.
Repeat until you get single sequence.

UPDATE:
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
list.Sort();
return list;
}
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
{
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
{
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
{
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
{
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
}
}
else
{
enumerators.RemoveAt(0);
}
}
do
{
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
}

Simple Dictionary Lookup is Slow in .Net Compared to Flat Array

I found that dictionary lookup could be very slow if compared to flat array access. Any idea why? I'm using Ants Profiler for performance testing. Here's a sample function that reproduces the problem:
private static void NodeDisplace()
{
var nodeDisplacement = new Dictionary<double, double[]>();
var times = new List<double>();
for (int i = 0; i < 6000; i++)
{
times.Add(i * 0.02);
}
foreach (var time in times)
{
nodeDisplacement.Add(time, new double[6]);
}
var five = 5;
var six = 6;
int modes = 10;
var arrayList = new double[times.Count*6];
for (int i = 0; i < modes; i++)
{
int k=0;
foreach (var time in times)
{
for (int j = 0; j < 6; j++)
{
var simpelCompute = five * six; // 0.027 sec
nodeDisplacement[time][j] = simpelCompute; //0.403 sec
arrayList[6*k+j] = simpelCompute; //0.0278 sec
}
k++;
}
}
}
Notice the relative magnitude between flat array access and dictionary access? Flat array is about 20 times faster than dictionary access ( 0.403/0.0278), after taking into account of the array index manipulation ( 6*k+j).
As weird as it sounds, but dictionary lookup is taking a major portion of my time, and I have to optimize it.

Yes, I'm not surprised. The point of dictionaries is that they're used to look up arbitrary keys. Consider what has to happen for a single array dereference:
Check bounds
Multiply index by element size
Add index to pointer
Very, very fast. Now for a dictionary lookup (very rough; depends on implementation):
Potentially check key for nullity
Take hash code of key
Find the right slot for that hash code (probably a "mod prime" operation)
Probably dereference an array element to find the information for that slot
Compare hash codes
If the hash codes match, compare for equality (and potentially go on to the next hash code match)
If you've got "keys" which can very easily be used as array indexes instead (e.g. contiguous integers, or something which can easily be mapped to contiguous integers) then that will be very, very fast. That's not the primary use case for hash tables. They're good for situations which can't easily be mapped that way - for example looking up by string, or by arbitrary double value (rather than doubles which are evenly spaced, and can thus be mapped to integers easily).
I would say that your title is misleading - it's not that dictionary lookup is slow, it's that when arrays are a more suitable approach, they're ludicrously fast.

In addition the Jon's answer I would like to add that your inner loop does not do very much, normally you do a least some more work in the inner loop and then the relative performance loss of the dictionary is somewhat lower.
If you look at the code for Double.GetHashCode() in Reflector you'll find that it is executing 4 lines of code (assuming your double is not 0), just that is more than the body of your inner loop. Dictionary<TKey, TValue>.Insert() (called by the set indexer) is even more code, almost a screen full.
The thing with Dictionary compared to a flat array is that you don't waste to much memory when your keys are not dense (as they are in your case) and that read and write are ~O(1) like arrays (but with a higher constant).
As a side note you can use a multi dimensional array instead of the 6*k+j trick.
Declare it this way
var arrayList = new double[times.Count, 6];
and use it this way
arrayList[k ,j] = simpelCompute;
It won't be faster, but it is easier to read.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find intersection of two multi-dimensional Arrays in C# 4.0 - c#

Related

how to create multidimensional arrays in C# without knowing the size

C# How to paste 2D array[,] onto second 2D array[,] on specified indexes

A workaround for a big multidimensional array (Jagged Array) C#?

Most efficient sorting algorithm for sorted sub-sequences

Simple Dictionary Lookup is Slow in .Net Compared to Flat Array

Categories

Resources