I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.
You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);
realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.
Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)
Related
I understand that mutable structs are evil. However, I'd still like to compare the performance of an array of structs vs an array of objects. This is what I have so far
public struct HelloStruct
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherStruct[] hello10;
}
public struct SomeOtherStruct
{
public int yoyo;
public int yiggityyo;
}
public class HelloClass
{
public int[] hello1;
public int[] hello2;
public int hello3;
public int hello4;
public byte[] hello5;
public byte[] hello6;
public string hello7;
public string hello8;
public string hello9;
public SomeOtherClass[] hello10;
}
public class SomeOtherClass
{
public int yoyo;
public int yiggityyo;
}
static void compareTimesClassVsStruct()
{
HelloStruct[] a = new HelloStruct[50];
for (int i = 0; i < a.Length; i++)
{
a[i] = default(HelloStruct);
}
HelloClass[] b = new HelloClass[50];
for (int i = 0; i < b.Length; i++)
{
b[i] = new HelloClass();
}
Console.WriteLine("Starting now");
var s1 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
a[i % 50].hello1 = new int[] { 1, 2, 3, 4, i % 50 };
a[i % 50].hello3 = i;
a[i % 50].hello7 = (i % 100).ToString();
}
s1.Stop();
var s2 = Stopwatch.StartNew();
for (int j = 0; j < _max; j++)
{
b[j % 50].hello1 = new int[] { 1, 2, 3, 4, j % 50 };
b[j % 50].hello3 = j;
b[j % 50].hello7 = (j % 100).ToString();
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalSeconds)));
Console.WriteLine(((double)(s2.Elapsed.TotalSeconds)));
Console.Read();
}
There's a couple of things happening here that I'd like to understand.
Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct? In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
Secondly, when I compare the timings inside CompareTimesClassVsStruct() I get approximately the same time. What is the reason behind that? Is there any case under which using an array of structs or an array of objects would outperform the other?
Thanks
When you access the properties of an element of an array of structs, you are NOT operating on a copy of the struct - you are operating on the struct itself. (This is NOT true of a List<SomeStruct> where you will be operating on copies, and the code in your example wouldn't even compile.)
The reason you are seeing similar times is because the times are being distorted by the (j % 100).ToString() and new int[] { 1, 2, 3, 4, j % 50 }; within the loops. The amount of time taken by those two statements is dwarfing the times taken by the array element access.
I changed the test app a little, and I get times for accessing the struct array of 9.3s and the class array of 10s (for 1,000,000,000 loops), so the struct array is noticeably faster, but pretty insignificantly so.
One thing which can make struct arrays faster to iterate over is locality of reference. When iterating over a struct array, adjacent elements are adjacent in memory, which reduces the number of processor cache misses.
The elements of class arrays are not adjacent (although the references to the elements in the array are, of course), which can result in many more processor cache misses while you iterate over the array.
Another thing to be aware of is that the number of contiguous bytes in a struct array is effectively (number of elements) * (sizeof(element)), whereas the number of contiguous bytes in a class array is (number of elements) * (sizeof(reference)) where the size of a reference is 32 bits or 64 bits, depending on memory model.
This can be a problem with large arrays of large structs where the total size of the array would exceed 2^31 bytes.
Another difference you might see in speed is when passing large structs as parameters - obviously it will be much quicker to pass by value a copy of the reference to a reference type on the stack than to pass by value a copy of a large struct.
Finally, note that your sample struct is not very representative. It contains a lot of reference types, all of which will be stored somewhere on the heap, not in the array itself.
As a rule of thumb, structs should not be more than 32 bytes or so in size (the exact limit is a matter of debate), they should contain only primitive (blittable) types, and they should be immutable. And, usually, you shouldn't worry about making things structs anyway, unless you have a provable performance need for them.
Firstly, since the array stores structs, when I try to access a struct from the array using the index operation, should I get a copy of the struct or a reference to the original struct?
Let me tell you what is actually happening rather than answering your confusingly worded either-or question.
Arrays are a collection of variables.
The index operation when applied to an array produces a variable.
Mutating a field of a mutable struct successfully requires that you have in hand the variable that contains the struct you wish to mutate.
So now to your question: Should you get a reference to the struct?
Yes, in the sense that a variable refers to storage.
No, in the sense that the variable does not contain a reference to an object; the struct is not boxed.
No, in the sense that the variable is not a ref variable.
However, if you had called an instance method on the result of the indexer, then a ref variable would have been produced for you; that ref variable is called "this", and it would have been passed to your instance method.
You see how confusing this gets. Better to not think about references at all. Think about variables and values. Indexing an array produces a variable.
Now deduce what would have happened had you used a list rather than an array, knowing that the getter indexer of a list produces a value, not a variable.
In this case when I inspect the array after running the code, I get the mutated struct values. Why is this so?
You mutated a variable.
I get approximately the same time. What is the reason behind that?
The difference is so tiny that it is being swamped by all the memory allocations and memory copies you are doing in both cases. That is the real takeaway here. Are operations on mutable value types stored in arrays slightly faster? Possibly. (They save on collection pressure as well, which is often the more relevant performance metric.) But though the relative savings might be significant, the savings as a percentage of total work is often tiny. If you have a performance problem then you want to attack the most expensive thing, not something that is already cheap.
I want my cake and to eat it.
I like the way Lists in C# dynamically expand when you go beyond the initial capacity of the array. However this is not enough. I want to be able to do something like this:
int[] n = new int[]; // Note how I'm NOT defining how big the array is.
n[5] = 9
Yes, there'll be some sacrifice in speed, because behind the scenes, .NET would need to check to see if the default capacity has been exceeded. If it has, then it could expand the array by 5x or so.
Unfortunately with Lists, you're not really meant to set an arbitrary element, and although it is possible if you do this, it still isn't possible to set say, the fifth element straight away without initially setting the size of the List, let alone have it expand dynamically when trying.
For any solution, I'd like to be able to keep the simple square bracket syntax (rather than using a relatively verbose-looking method call), and have it relatively fast (preferably almost as fast as standard arrays) when it's not expanding the array.
Note that I don't necessarily advocate inheriting List, but if you really want this:
public class MyList<T> : List<T>
{
public T this[int i]
{
get {
while (i >= this.Count) this.Add(default(T));
return base[i];
}
set {
while (i >= this.Count) this.Add(default(T));
base[i] = value;
}
}
}
I'll add that if you expect most of the values of your "array" to remain empty over the life of your program, you'll get much greater efficiency by using a Dictionary<int, T>, especially as the size of the collection grows large.
A simple solution to the problem is to inherit from Dictionary<TKey, TValue> and just use the value generic:
public class MyCoolType<T> : Dictionary<int, T> { }
Then you would be able to use it like:
MyCoolType<int> n = new MyCoolType<int>();
n[5] = 9;
And a note on performance.
For insertions, this is much faster than a list since it does not require you to resize or insert elements at arbitrary positions in an array. List<T> uses an array as a backing field and when you resize it, it is expensive. (Edit: Lists have a default size and its not always that you are resizing it, but when you do, its expensive)
For look-ups, this is very nearly O(1) (source), so comparable to an Array look-up. Lists are O(n), which get progressively slower as you increase the number of contained elements.
Sparsely packing is much more memory efficient than using a List with dense packing as it doesn't require you to use empty items just to reach a specific index.
Other Notes:
In the other solutions, try inserting an item at index 570442959 for example, you'll get an OutOfMemoryException thrown (under 32 bit, but even 64-bit has problems). With this solution you can use any conceivable index that the int type supports, up to int.MaxValue.
Lists don't allow negative indexes, this will.
MyCoolType.Count is the equivalent of the array Length property here.
Here are the results of my performance test:
Inserting 1 million elements into MyList: 29.4294424 seconds
Inserting 1 million elements into CoolType: 0.127499 seconds
Looking up 1 million random elements MyList: 1.6330562 seconds
Looking up 1 million random elements CoolType: 1.304348 seconds
Full source to tests here: http://pastebin.com/kEdLgFaw
Note, to run these tests I had to set to X64 build, debug, and had to add the following to the app.config file:
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
Here is your pi
static public List<int> AddToList(int index,int value, List<int> input)
{
if (index >= input.Count)
{
int[] temparray = new int[index - input.Count + 1];
input.AddRange(temparray);
}
return (input[index] = value);
}
You can define an extension method on List:
public static class ExtensionMethods {
public static void Set<T>(this List<T> list, int index, T element) {
if (index < list.Count) {
list[index] = element;
} else {
for (int i = list.Count; i < index; i++) {
list.Add(default(T));
}
list.Add(element);
}
}
}
and call list.Set(12, 1024) if you want the 12th element to be 1024.
I'm unsure about whether to use a mutable struct or a mutable class.
My program stores an array with a lot of objects.
I've noticed that using a class doubles the amount of memory needed. However, I want the objects to be mutable, and I've been told that using mutable structs is evil.
This is what my type looks like:
struct /* or class */ Block
{
public byte ID;
public bool HasMetaData; // not sure whether HasMetaData == false or
// MetaData == null is faster, might remove this
public BlockMetaData MetaData; // BlockMetaData is always a class reference
}
Allocating a large amount of objects like this (notice that both codes below are run 81 times):
// struct
Block[,,] blocks = new Block[16, 256, 16];
uses about 35 MiB of memory, whilst doing it like this:
// class
Block[,,] blocks = new Block[16, 256, 16];
for (int z = 0; z < 16; z++)
for (int y = 0; y < 256; y++)
for (int x = 0; x < 16; x++)
blocks[x, y, z] = new Block();
uses about 100 MiB of ram.
So to conclude, my question is as follows:
Should I use a struct or a class for my Block type? Instances should be mutable and store a few values plus one object reference.
First off, if you really want to save memory then don't be using a struct or a class.
byte[,,] blockTypes = new byte[16, 256, 16];
BlockMetaData[,,] blockMetadata = new BlockMetaData[16, 256, 16];
You want to tightly pack similar things together in memory. You never want to put a byte next to a reference in a struct if you can possibly avoid it; such a struct will waste three to seven bytes automatically. References have to be word-aligned in .NET.
Second, I'm assuming that you're building a voxel system here. There might be a better way to represent the voxels than a 3-d array, depending on their distribution. If you are going to be making a truly enormous number of these things then store them in an immutable octree. By using the persistence properties of the immutable octree you can make cubic structures with quadrillions of voxels in them so long as the universe you are representing is "clumpy". That is, there are large regions of similarity throughout the world. You trade somewhat larger O(lg n) time for accessing and changing elements, but you get to have way, way more elements to work with.
Third, "ID" is a really bad way to represent the concept of "type". When I see "ID" I assume that the number uniquely identifies the element, not describes it. Consider changing the name to something less confusing.
Fourth, how many of the elements have metadata? You can probably do far better than an array of references if the number of elements with metadata is small compared to the total number of elements. Consider a sparse array approach; sparse arrays are much more space efficient.
Do they really have to be mutable? You could always make it an immutable struct with methods to create a new value with one field different:
struct Block
{
// I'd definitely get rid of the HasMetaData
private readonly byte id;
private readonly BlockMetaData metaData;
public Block(byte id, BlockMetaData metaData)
{
this.id = id;
this.metaData = metaData;
}
public byte Id { get { return id; } }
public BlockMetaData MetaData { get { return metaData; } }
public Block WithId(byte newId)
{
return new Block(newId, metaData);
}
public Block WithMetaData(BlockMetaData newMetaData)
{
return new Block(id, newMetaData);
}
}
I'm still not sure whether I'd make it a struct, to be honest - but I'd try to make it immutable either way, I suspect.
What are your performance requirements in terms of both memory and speed? How close does an immutable class come to those requirements?
An array of structs will offer better storage efficiency than an array of immutable references to distinct class instances having the same fields, because the latter will require all of the memory required by the former, in addition to memory to manage the class instances and memory required to hold the references. All that having been said, your struct as designed has a very inefficient layout. If you're really concerned about space, and every item in fact needs to independently store a byte, a Boolean, and a class reference, your best bet may be to either have two arrays of byte (a byte is actually smaller than a Boolean) and an array of class references, or else have an array of bytes, an array with 1/32 as many elements of something like BitVector32, and an array of class references.
I am planning to implement a bounded queue without using the Queue<T> class. After reading pros and cons of Arrays and LinkedList<T>, I am leaning more towards using Array to implement queue functionality. The collection will be fixed size. I just want to add and remove items from the queue.
something like
public class BoundedQueue<T>
{
private T[] queue;
int queueSize;
public BoundedQueue(int size)
{
this.queueSize = size;
queue = new T[size + 1];
}
}
instead of
public class BoundedQueue<T>
{
private LinkedList<T> queue;
int queueSize;
public BoundedQueue(int size)
{
this.queueSize = size;
queue = new LinkedList<T>();
}
}
I have selected Array because of efficiency and due to the fact that collection is fixed size. Would like to get other opinions on this. Thanks.
Well, it would be a mistake. I'm going to guess your are a former C/C++ programmer, std::list is king. On the surface, it is incredibly frugal with memory, can't make a list possibly more efficient than only allocating the memory you need, right? Yes, LinkedList does that.
But no, it is incredibly incompatible with the way CPU caches work, they really like arrays and hate pointers. Put the garbage collector on top of that, so quite capable of packing memory.
The read-them-and-weep benchmarks are here. Stark.
You should of course use a Queue<T>, but in the question you said that you don't want to use queue and instead implement the queue yourself. You need to consider first your use case for this class. If you want to implement something quickly you can use a LinkedList<T> but for a general purpose library you would want something faster.
You can see how it is implemented in .NET by using .NET Reflector. These are the fields it has:
private T[] _array;
private const int _DefaultCapacity = 4;
private static T[] _emptyArray;
private const int _GrowFactor = 200;
private int _head;
private const int _MinimumGrow = 4;
private const int _ShrinkThreshold = 0x20;
private int _size;
[NonSerialized]
private object _syncRoot;
private int _tail;
private int _version;
As you can see it uses an array. It is also quite complicated with many fields concerned with how the array should be resized. Even if you are implementing a bounded array you would want to allow that the array can be larger than the capacity to avoid constantly moving items in memory.
Regarding thread safety, neither type offers any guarantees. For example in the documentation for LinkedList<T> it says this:
This type is not thread safe. If the LinkedList needs to be accessed by multiple threads, you will need to implement their own synchronization mechanism.
I'm not sure why you'd rule out using a Queue<T> internally, especially considering you're up for using a LinkedList<T> (they're in the same assembly). A Queue<T> would give you the greatest performance and memory usage. Your class could look something like this:
public class BoundedQueue<T>
{
private Queue<T> _queue;
private int _maxSize;
public BoundedQueue(int maxSize)
{
if (maxSize <= 0)
throw new ArgumentOutOfRangeException("maxSize");
_queue = new Queue<T>(maxSize);
_maxSize = maxSize;
}
public int Count
{
get { return _queue.Count; }
}
/// <summary>
/// Adds a new item to the queue and, if the queue is at its
/// maximum capacity, also removes the oldest item
/// </summary>
/// <returns>
/// True if an item was dequeued during this operation;
/// otherwise, false
/// </returns>
public bool EnqueueDequeue(T value, out T dequeued)
{
dequeued = default(T);
bool dequeueOccurred = false;
if (_queue.Count == _maxSize)
{
dequeued = _queue.Dequeue();
dequeueOccurred = true;
}
_queue.Enqueue(value);
return dequeueOccurred;
}
}
But maybe you had a good reason for ruling out the Queue<T> class that I just can't think of?
You can use an array, you just have to keep count of the index of the head element, or move everything down by one each time you add something. Arrays are good for accessing by an index, linked lists are good for next/previous and fast insertion.
for instance, if you have [1,2,3,4,5], and the head is 1, you add 6, you'd drop 5 off the back I guess and put six in its place. 6 would be the new head, but the contents of the array would be [1,2,3,4,6].
So here are the main difference/avantages/disavantages of using both arrays and linked-lists:
Arrays:
- Adding items to arrays can be relatively costly if the insertion is not made at the end (as well as deleting) because all the array elements have to be moved.
- Very efficient if object are added at the end
- Access to the elements is very fast... Simply point to the adress!
LinkedList:
- Adding elements anywhere in the queue is always the same cost in time, and is very fast
- Accessing the elements has to be done with an accessor (iterator).
So your trying to implement a queue... but what kind of queue?
It all depends on what you will do with it.
If your implementing First In First Out (or Last In Last Out) queue (like a stack), you are better off using a Linked-List, since you can always use the same accessor to access the front or back-end of your list.
But if you want a queue and have to constantly access your elements at different places, go for the array!
From what I understood of your task, I would have recommended a Linked List... but you will know best!
This will only be a problem if you start having ALOT of elements in your queue. If you stay below the few thousands, it doesn
hope it helps
How will your bounded queue behave when an element is added beyond its capacity? Will the first item be pushed out like this [1, 2, 3] -> [2, 3, 4] or will the last item be replaced like this [1, 2, 3] -> [1, 2, 4]? If the former, then I'd recommend a LinkedList. If the latter, an array or List<T> is fine. I just thought I'd ask since the behavior of your object will determine the appropriate course of action, and that behavior was never defined as far as I can tell. Maybe everyone but me just already knows exactly what you meant by a "bounded queue", but I didn't want to assume.
I am parsing an arbitrary length byte array that is going to be passed around to a few different layers of parsing. Each parser creates a Header and a Packet payload just like any ordinary encapsulation.
My problem lies in how the encapsulation holds its packet byte array payload. Say I have a 100 byte array with three levels of encapsulation. Three packet objects will be created and I want to set the payload of these packets to the corresponding position in the byte array of the packet.
For example, let's say the payload size is 20 for all levels, then imagine it has a public byte[] Payload on each object. However, the problem is that this byte[] Payload is a copy of the original 100 bytes, so I'm going to end up with 160 bytes in memory instead of 100.
If it were in C++, I could just easily use a pointer - however, I'm writing this in C#.
So I created the following class:
public class PayloadSegment<T> : IEnumerable<T>
{
public readonly T[] Array;
public readonly int Offset;
public readonly int Count;
public PayloadSegment(T[] array, int offset, int count)
{
this.Array = array;
this.Offset = offset;
this.Count = count;
}
public T this[int index]
{
get
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
return Array[Offset + index];
}
set
{
if (index < 0 || index >= this.Count)
throw new IndexOutOfRangeException();
else
Array[Offset + index] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (int i = Offset; i < Offset + Count; i++)
yield return Array[i];
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
IEnumerator<T> enumerator = this.GetEnumerator();
while (enumerator.MoveNext())
{
yield return enumerator.Current;
}
}
}
This way I can simply reference a position inside the original byte array but use positional indexing. However, if I do something like:
PayloadSegment<byte> something = new PayloadSegment<byte>(someArray, 5, 10);
byte[] somethingArray = something.ToArray();
Will the somethingArray be a copy of the bytes, or a reference to the original PayloadSegment (which in turn is a reference to the original byte array)?
EDIT: Actually after rethinking this, can't I simply use a new MemoryStream(array, offset, length)?
The documentation for the Enumerable.ToArray extension method doesn't specifically mention what it does when it's passed a sequence that happens to already be an array. But a simple check with .NET Reflector reveals that it does indeed create a copy of the array.
It is worth noting however that when given a sequence that implements ICollection<T> (which Array does) the copy can be done much faster because the number of elements is known up front so it does not have to do dynamic resizing of the buffer such as List<T> does.
There is a very strong practice which suggests that calling "ToArray" on an object should return a new array which is detached from anything else. Nothing that is done to the original object should affect the array, and nothing which is done to the array should affect the original object. My personal preference would have been to call the routine "ToNewArray", to make explicit that each call will return a different new array.
A few of my classes have an "AsReadableArray", which returns an array which may or may not be attached to anything else. The array won't change in response to manipulations to the original object, but it's possible that multiple reads yielding the same data (which they often will) will return the same array. I really wish .net had an ImmutableArray type, supporting the same sorts of operations as String [a String, in essence, being an ImmutableArray(Of Char)], and a ReadableArray abstract type (from which both Array and ImmutableArray would inherit). I doubt such a thing could be squeezed into .Net 5.0, but it would allow a lot of things to be done much more cleanly.
It is a copy. When you call a To<Type> method, it creates a copy of the source element with the target Type
Because byte is a value type, the array will hold copies of the values, not pointers to them.
If you need the same behavior as an reference type, it is best to create a class that holds the byte has a property, and may group other data and functionality.
It's a copy. It would be very unintuitive if I passed something.ToArray() to some method, and the method changed the value of something by changing the array!