struct array vs object array c# - c#

I understand that mutable structs are evil. However, I'd still like to compare the performance of an array of structs vs an array of objects. This is what I have so far
public struct HelloStruct
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherStruct[] hello10;
}
public struct SomeOtherStruct
{
public int yoyo;
public int yiggityyo;
}
public class HelloClass
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherClass[] hello10;
}
public class SomeOtherClass
{
public int yoyo;
public int yiggityyo;
}
static void compareTimesClassVsStruct()
{
HelloStruct[] a = new HelloStruct[50];
for (int i = 0; i < a.Length; i++)
{
a[i] = default(HelloStruct);
}
HelloClass[] b = new HelloClass[50];
for (int i = 0; i < b.Length; i++)
{
b[i] = new HelloClass();
}
Console.WriteLine("Starting now");
var s1 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
a[i % 50].hello1 = new int[] { 1, 2, 3, 4, i % 50 };
a[i % 50].hello3 = i;
a[i % 50].hello7 = (i % 100).ToString();
}
s1.Stop();
var s2 = Stopwatch.StartNew();
for (int j = 0; j < _max; j++)
{
b[j % 50].hello1 = new int[] { 1, 2, 3, 4, j % 50 };
b[j % 50].hello3 = j;
b[j % 50].hello7 = (j % 100).ToString();
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalSeconds)));
Console.WriteLine(((double)(s2.Elapsed.TotalSeconds)));
Console.Read();
}
There's a couple of things happening here that I'd like to understand.
Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct? In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
Secondly, when I compare the timings inside CompareTimesClassVsStruct() I get approximately the same time. What is the reason behind that? Is there any case under which using an array of structs or an array of objects would outperform the other?
Thanks

When you access the properties of an element of an array of structs, you are NOT operating on a copy of the struct - you are operating on the struct itself. (This is NOT true of a List<SomeStruct> where you will be operating on copies, and the code in your example wouldn't even compile.)
The reason you are seeing similar times is because the times are being distorted by the (j % 100).ToString() and new int[] { 1, 2, 3, 4, j % 50 }; within the loops. The amount of time taken by those two statements is dwarfing the times taken by the array element access.
I changed the test app a little, and I get times for accessing the struct array of 9.3s and the class array of 10s (for 1,000,000,000 loops), so the struct array is noticeably faster, but pretty insignificantly so.
One thing which can make struct arrays faster to iterate over is locality of reference. When iterating over a struct array, adjacent elements are adjacent in memory, which reduces the number of processor cache misses.
The elements of class arrays are not adjacent (although the references to the elements in the array are, of course), which can result in many more processor cache misses while you iterate over the array.
Another thing to be aware of is that the number of contiguous bytes in a struct array is effectively (number of elements) * (sizeof(element)), whereas the number of contiguous bytes in a class array is (number of elements) * (sizeof(reference)) where the size of a reference is 32 bits or 64 bits, depending on memory model.
This can be a problem with large arrays of large structs where the total size of the array would exceed 2^31 bytes.
Another difference you might see in speed is when passing large structs as parameters - obviously it will be much quicker to pass by value a copy of the reference to a reference type on the stack than to pass by value a copy of a large struct.
Finally, note that your sample struct is not very representative. It contains a lot of reference types, all of which will be stored somewhere on the heap, not in the array itself.
As a rule of thumb, structs should not be more than 32 bytes or so in size (the exact limit is a matter of debate), they should contain only primitive (blittable) types, and they should be immutable. And, usually, you shouldn't worry about making things structs anyway, unless you have a provable performance need for them.

Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct?
Let me tell you what is actually happening rather than answering your confusingly worded either-or question.
Arrays are a collection of variables.
The index operation when applied to an array produces a variable.
Mutating a field of a mutable struct successfully requires that you have in hand the variable that contains the struct you wish to mutate.
So now to your question: Should you get a reference to the struct?
Yes, in the sense that a variable refers to storage.
No, in the sense that the variable does not contain a reference to an object; the struct is not boxed.
No, in the sense that the variable is not a ref variable.
However, if you had called an instance method on the result of the indexer, then a ref variable would have been produced for you; that ref variable is called "this", and it would have been passed to your instance method.
You see how confusing this gets. Better to not think about references at all. Think about variables and values. Indexing an array produces a variable.
Now deduce what would have happened had you used a list rather than an array, knowing that the getter indexer of a list produces a value, not a variable.
In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
You mutated a variable.
I get approximately the same time. What is the reason behind that?
The difference is so tiny that it is being swamped by all the memory allocations and memory copies you are doing in both cases. That is the real takeaway here. Are operations on mutable value types stored in arrays slightly faster? Possibly. (They save on collection pressure as well, which is often the more relevant performance metric.) But though the relative savings might be significant, the savings as a percentage of total work is often tiny. If you have a performance problem then you want to attack the most expensive thing, not something that is already cheap.

Related

How to truncate an array in place in C#

I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.
You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);
realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.
Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)

Converting a method to use any Enum

My Problem:
I want to convert my randomBloodType() method to a static method that can take any enum type. I want my method to take any type of enum whether it be BloodType, DaysOfTheWeek, etc. and perform the operations shown below.
Some Background on what the method does:
The method currently chooses a random element from the BloodType enum based on the values assigned to each element. An element with a higher value has a higher probability to be picked.
Code:
public enum BloodType
{
// BloodType = Probability
ONeg = 4,
OPos = 36,
ANeg = 3,
APos = 28,
BNeg = 1,
BPos = 20,
ABNeg = 1,
ABPos = 5
};
public BloodType randomBloodType()
{
// Get the values of the BloodType enum and store it in a array
BloodType[] bloodTypeValues = (BloodType[])Enum.GetValues(typeof(BloodType));
List<BloodType> bloodTypeList = new List<BloodType>();
// Create a list where each element occurs the approximate number of
// times defined as its value(probability)
foreach (BloodType val in bloodTypeValues)
{
for(int i = 0; i < (int)val; i++)
{
bloodTypeList.Add(val);
}
}
// Sum the values
int sum = 0;
foreach (BloodType val in bloodTypeValues)
{
sum += (int)val;
}
//Get Random value
Random rand = new Random();
int randomValue = rand.Next(sum);
return bloodTypeList[randomValue];
}
What I have tried so far:
I have tried to use generics. They worked out for the most part, but I was unable to cast my enum elements to int values. I included a example of a section of code that was giving me problems below.
foreach (T val in bloodTypeValues)
{
sum += (int)val; // This line is the problem.
}
I have also tried using Enum e as a method parameter. I was unable to declare the type of my array of enum elements using this method.
(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
You are tying the enum's underlying type to the code that then must interpret that type.
Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32() (which can convert from long to int just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg and ABNeg will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg or ABNeg that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap while still keeping the method 100% generic would be to have the _typeToAggregatedWeights be of type Dictionary<Type, Lazy<object>>, and have the values of the dictionary (the Lazy<object> objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T> implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch() method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.
Assuming your enum values are all of type int (you can adjust this accordingly if they're long, short, or whatever):
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof (TEnum), curr);
return agg.Concat(Enumerable.Repeat((TEnum)value,(int)value)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
Here's how you would use it:
var rng = new Random();
var randomBloodType = RandomEnumValue<BloodType>(rng);
People seem to have their knickers in a knot about multiple indistinguishable enum values in the input enum (for which I still think the above code provides expected behavior). Note that there is no answer here, not even Peter Duniho's, that will allow you to distinguish enum entries when they have the same value, so I'm not sure why this is being considered as a metric for any potential solutions.
Nevertheless, an alternative approach that doesn't use the enum values as probabilities is to use an attribute to specify the probability:
public enum BloodType
{
[P=4]
ONeg,
[P=36]
OPos,
[P=3]
ANeg,
[P=28]
APos,
[P=1]
BNeg,
[P=20]
BPos,
[P=1]
ABNeg,
[P=5]
ABPos
}
Here is what the attribute used above looks like:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class PAttribute : Attribute
{
public int Weight { get; private set; }
public PAttribute(int weight)
{
Weight = weight;
}
}
and finally, this is what the method to get a random enum value would like:
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof(TEnum), curr);
FieldInfo fi = typeof (TEnum).GetField(curr);
var weight = ((PAttribute)fi.GetCustomAttribute(typeof(PAttribute), false)).Weight;
return agg.Concat(Enumerable.Repeat((TEnum)value, weight)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
(Note: if this code is performance critical, you might need to tweak this and add caching for the reflection data).
Some of this you can do and some of it isn't so easy. I believe the following extension method will do what you describe.
static public class Util {
static Random rnd = new Random();
static public int PriorityPickEnum(this Enum e) {
// The approved types for an enum are byte, sbyte, short, ushort, int, uint, long, or ulong
// However, Random only supports a int (or double) as a max value. Either way
// it doesn't have the range for uint, long and ulong.
//
// sum enum
int sum = 0;
foreach (var x in Enum.GetValues(e.GetType())) {
sum += Convert.ToInt32(x);
}
var i = rnd.Next(sum); // get a random value, it will form a ratio i / sum
// enums may not have a uniform (incremented) value range (think about flags)
// therefore we have to step through to get to the range we want,
// this is due to the requirement that return value have a probability
// proportional to it's value. Note enum values must be sorted for this to work.
foreach (var x in Enum.GetValues(e.GetType()).OfType<Enum>().OrderBy(a => a)) {
i -= Convert.ToInt32(x);
if (i <= 0) return Convert.ToInt32(x);
}
throw new Exception("This doesn't seem right");
}
}
Here is an example of using this extension:
BloodType bt = BloodType.ABNeg;
for (int i = 0; i < 100; i++) {
var v = (BloodType) bt.PriorityPickEnum();
Console.WriteLine("{0}: {1}({2})", i, v, (int) v);
}
This should work pretty well for enum's of type byte, sbyte, ushort, short and int. Once you get beyond int (uint, long, ulong) the problem is the Random class. You can adjust the code to use doubles generated by Random, which would cover uint, but the Random class just doesn't have the range to cover long and ulong. Of course you could use/find/write a different Random class if this is important.

Store for loop value in array

I want to write a function which return an array which contain a for loop that enumerates some value I want to store them in to an array. I tried this
public int[] a()
{
int[] b=new int []{};
for(int i=0;i<10;i++)
{
b[i]=i {Index out of range exception comes}
}
return b;
}
I don't like to use enumerable.range() because of performance issue.
I want to keep the array size empty.
In your case you need an array with 10 elements in it. In some languages you could do what you are trying to do (JavaScript being one). Let's assume you could extend an array in C# then your code would allocate space for one element at a time in each iteration of the loop resulting in the allocation of 10 elements. Optimally this would be as fast as allocating 10 elements in one go.
However that's probably unlikely and it's never going to be faster than requesting once for all of them to be allocated. So in other words there's no performance gain to be found by not simply allocating all 10 elements in one go
public int[] a()
{
int[] b=new int [10];
for(int i=0;i<b.Length;i++)
{
b[i]=i;
}
return b;
}
However a much more readable approach would be
public int[] a()
{
return Enumerable.Range(0,10).ToArray();
}
int[] b=new int []{}; means your array b[] is zero length. You get an index out of range exception on b[i]=i because there are no elements. You're effectively doing b[0]=0 but element 0 does not exist.

Mutable struct vs. class?

I'm unsure about whether to use a mutable struct or a mutable class.
My program stores an array with a lot of objects.
I've noticed that using a class doubles the amount of memory needed. However, I want the objects to be mutable, and I've been told that using mutable structs is evil.
This is what my type looks like:
struct /* or class */ Block
{
public byte ID;
public bool HasMetaData; // not sure whether HasMetaData == false or
// MetaData == null is faster, might remove this
public BlockMetaData MetaData; // BlockMetaData is always a class reference
}
Allocating a large amount of objects like this (notice that both codes below are run 81 times):
// struct
Block[,,] blocks = new Block[16, 256, 16];
uses about 35 MiB of memory, whilst doing it like this:
// class
Block[,,] blocks = new Block[16, 256, 16];
for (int z = 0; z < 16; z++)
for (int y = 0; y < 256; y++)
for (int x = 0; x < 16; x++)
blocks[x, y, z] = new Block();
uses about 100 MiB of ram.
So to conclude, my question is as follows:
Should I use a struct or a class for my Block type? Instances should be mutable and store a few values plus one object reference.
First off, if you really want to save memory then don't be using a struct or a class.
byte[,,] blockTypes = new byte[16, 256, 16];
BlockMetaData[,,] blockMetadata = new BlockMetaData[16, 256, 16];
You want to tightly pack similar things together in memory. You never want to put a byte next to a reference in a struct if you can possibly avoid it; such a struct will waste three to seven bytes automatically. References have to be word-aligned in .NET.
Second, I'm assuming that you're building a voxel system here. There might be a better way to represent the voxels than a 3-d array, depending on their distribution. If you are going to be making a truly enormous number of these things then store them in an immutable octree. By using the persistence properties of the immutable octree you can make cubic structures with quadrillions of voxels in them so long as the universe you are representing is "clumpy". That is, there are large regions of similarity throughout the world. You trade somewhat larger O(lg n) time for accessing and changing elements, but you get to have way, way more elements to work with.
Third, "ID" is a really bad way to represent the concept of "type". When I see "ID" I assume that the number uniquely identifies the element, not describes it. Consider changing the name to something less confusing.
Fourth, how many of the elements have metadata? You can probably do far better than an array of references if the number of elements with metadata is small compared to the total number of elements. Consider a sparse array approach; sparse arrays are much more space efficient.
Do they really have to be mutable? You could always make it an immutable struct with methods to create a new value with one field different:
struct Block
{
// I'd definitely get rid of the HasMetaData
private readonly byte id;
private readonly BlockMetaData metaData;
public Block(byte id, BlockMetaData metaData)
{
this.id = id;
this.metaData = metaData;
}
public byte Id { get { return id; } }
public BlockMetaData MetaData { get { return metaData; } }
public Block WithId(byte newId)
{
return new Block(newId, metaData);
}
public Block WithMetaData(BlockMetaData newMetaData)
{
return new Block(id, newMetaData);
}
}
I'm still not sure whether I'd make it a struct, to be honest - but I'd try to make it immutable either way, I suspect.
What are your performance requirements in terms of both memory and speed? How close does an immutable class come to those requirements?
An array of structs will offer better storage efficiency than an array of immutable references to distinct class instances having the same fields, because the latter will require all of the memory required by the former, in addition to memory to manage the class instances and memory required to hold the references. All that having been said, your struct as designed has a very inefficient layout. If you're really concerned about space, and every item in fact needs to independently store a byte, a Boolean, and a class reference, your best bet may be to either have two arrays of byte (a byte is actually smaller than a Boolean) and an array of class references, or else have an array of bytes, an array with 1/32 as many elements of something like BitVector32, and an array of class references.

IEnumerable<T> ToArray usage - Is it a copy or a pointer?

I am parsing an arbitrary length byte array that is going to be passed around to a few different layers of parsing. Each parser creates a Header and a Packet payload just like any ordinary encapsulation.
My problem lies in how the encapsulation holds its packet byte array payload. Say I have a 100 byte array with three levels of encapsulation. Three packet objects will be created and I want to set the payload of these packets to the corresponding position in the byte array of the packet.
For example, let's say the payload size is 20 for all levels, then imagine it has a public byte[] Payload on each object. However, the problem is that this byte[] Payload is a copy of the original 100 bytes, so I'm going to end up with 160 bytes in memory instead of 100.
If it were in C++, I could just easily use a pointer - however, I'm writing this in C#.
So I created the following class:
public class PayloadSegment<T> : IEnumerable<T>
{
public readonly T[] Array;
public readonly int Offset;
public readonly int Count;
public PayloadSegment(T[] array, int offset, int count)
{
this.Array = array;
this.Offset = offset;
this.Count = count;
}
public T this[int index]
{
get
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
return Array[Offset + index];
}
set
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
Array[Offset + index] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (int i = Offset; i < Offset + Count; i++)
yield return Array[i];
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
IEnumerator<T> enumerator = this.GetEnumerator();
while (enumerator.MoveNext())
{
yield return enumerator.Current;
}
}
}
This way I can simply reference a position inside the original byte array but use positional indexing. However, if I do something like:
PayloadSegment<byte> something = new PayloadSegment<byte>(someArray, 5, 10);
byte[] somethingArray = something.ToArray();
Will the somethingArray be a copy of the bytes, or a reference to the original PayloadSegment (which in turn is a reference to the original byte array)?
EDIT: Actually after rethinking this, can't I simply use a new MemoryStream(array, offset, length)?
The documentation for the Enumerable.ToArray extension method doesn't specifically mention what it does when it's passed a sequence that happens to already be an array. But a simple check with .NET Reflector reveals that it does indeed create a copy of the array.
It is worth noting however that when given a sequence that implements ICollection<T> (which Array does) the copy can be done much faster because the number of elements is known up front so it does not have to do dynamic resizing of the buffer such as List<T> does.
There is a very strong practice which suggests that calling "ToArray" on an object should return a new array which is detached from anything else. Nothing that is done to the original object should affect the array, and nothing which is done to the array should affect the original object. My personal preference would have been to call the routine "ToNewArray", to make explicit that each call will return a different new array.
A few of my classes have an "AsReadableArray", which returns an array which may or may not be attached to anything else. The array won't change in response to manipulations to the original object, but it's possible that multiple reads yielding the same data (which they often will) will return the same array. I really wish .net had an ImmutableArray type, supporting the same sorts of operations as String [a String, in essence, being an ImmutableArray(Of Char)], and a ReadableArray abstract type (from which both Array and ImmutableArray would inherit). I doubt such a thing could be squeezed into .Net 5.0, but it would allow a lot of things to be done much more cleanly.
It is a copy. When you call a To<Type> method, it creates a copy of the source element with the target Type
Because byte is a value type, the array will hold copies of the values, not pointers to them.
If you need the same behavior as an reference type, it is best to create a class that holds the byte has a property, and may group other data and functionality.
It's a copy. It would be very unintuitive if I passed something.ToArray() to some method, and the method changed the value of something by changing the array!

Categories

Resources