Related
I'm learning C# and basically know the difference between arrays and Lists that the last is a generic and can dynamically grow but I'm wondering:
are List elements sequentially located in heap like array or is each element located "randomly" in a different locations?
and if that is true, does that affect the speed of access & data retrieval from memory?
and if that is true, is this what makes arrays a little faster than Lists?
Let's see the second and the third questions first:
and if that true does that affect the speed of access & data retrieval from memory ?
and if that true is this what makes array little faster than list ?
There is only a single type of "native" collection in .NET (with .NET I mean the CLR, so the runtime): the array (technically, if you consider a string a type of collection, then there are two native types of collections :-) ) (technically part 2: not all the arrays you think that are arrays are "native" arrays... Only the monodimensional 0 based arrays are "native" arrays. Arrays of type T[,] aren't, and arrays where the first element doesn't have an index of 0 aren't) . Every other collection (other than the LinkedList<>) is built atop it. If you look at the List<T> with IlSpy you'll see that at the base of it there is a T[] with an added int for the Count (the T[].Length is the Capacity). Clearly an array is a little faster than a List<T> because to use it, you have one less indirection (you access the array directly, instead of accessing the array that accesses the list).
Let's see the first question:
does List elements sequentially located in heap like array or each element is located randomly in different locations?
Being based on an array internally, clearly the List<> memorizes its elements like an array, so in a contiguous block of memory (but be aware that with a List<SomeObject> where SomeObject is a reference type, the list is a list of references, not of objects, so the references are put in a contiguous block of memory (we will ignore that with the advanced memory management of computers, the word "contiguous block of memory" isn't exact", it would be better to say "a contiguous block of addresses") )
(yes, even Dictionary<> and HashSet<> are built atop arrays. Conversely a tree-like collection could be built without using an array, because it's more similar to a LinkedList)
Some additional details: there are four groups of instructions in the CIL language (the intermediate language used in compiled .NET programs) that are used with "native" arrays:
Newarr
Ldelem and family Ldelem_*
Stelem and family Stelem_*
ReadOnly (don't ask me its use, I don't know, and the documentation isn't clear)
if you look at OpCodes.Newarr you'll see this comment in the XML documentation:
// Summary:
// Pushes an object reference to a new zero-based, one-dimensional array whose
// elements are of a specific type onto the evaluation stack.
Yes, elements in a List are stored contiguously, just like an array. A List actually uses arrays internally, but that is an implementation detail that you shouldn't really need to be concerned with.
Of course, in order to get the correct impression from that statement, you also have to understand a bit about memory management in .NET. Namely, the difference between value types and reference types, and how objects of those types are stored. Value types will be stored in contiguous memory. With reference types, the references will be stored in contiguous memory, but not the instances themselves.
The advantage of using a List is that the logic inside of the class handles allocating and managing the items for you. You can add elements anywhere, remove elements from anywhere, and grow the entire size of the collection without having to do any extra work. This is, of course, also what makes a List slightly slower than an array. If any reallocation has to happen in order to comply with your request, there'll be a performance hit as a new, larger-sized array is allocated and the elements are copied to it. But it won't be any slower than if you wrote the code to do it manually with a raw array.
If your length requirement is fixed (i.e., you never need to grow/expand the total capacity of the array), you can go ahead and use a raw array. It might even be marginally faster than a List because it avoids the extra overhead and indirection (although that is subject to being optimized out by the JIT compiler).
If you need to be able to dynamically resize the collection, or you need any of the other features provided by the List class, just use a List. The performance difference will be virtually imperceptible.
I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic list and when they are, I'll loop them and check value a and maybe update value b.
Is it faster/better to create these objects as class or as struct?
EDIT
a. The properties are value types (except the string i think?)
b. They might (we're not sure yet) have a validate method
EDIT 2
I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?
Is it faster to create these objects as class or as struct?
You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.
Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.
Is it better to create these objects as class or as struct?
Maybe class, maybe struct. As a rule of thumb:
If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.
If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.
Are objects on the heap and the stack processed equally by the garbage collector?
No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)
But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.
Is that clear?
Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.
Example:
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i].id = i;
list[i].isValid = true;
}
is about 2 to 3 times faster than
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i] = new Value(i, true);
}
where Value is a struct with two fields (id and isValid).
struct Value
{
int id;
bool isValid;
public Value(int i, bool isValid)
{
this.i = i;
this.isValid = isValid;
}
}
On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.
Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.
In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.
(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)
Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.
When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.
More here:
http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx
If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.
As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.
Edit:
I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.
There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.
Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.
The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.
A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.
I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.
For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.
A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:
class ExposedHolder<T>
{
public T Value;
ExposedHolder() { }
ExposedHolder(T val) { Value = T; }
}
If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.
One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.
Use classes.
On a general note. Why not update value b as you create them?
From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.
Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.
That's re: performance.
I have a class that's going to hold 3 parallel arrays. (For a class assignment we're basically coding a rudimentary xml parser...beginning programming class)
*Note, I'm doing this in very basic OOP. I've got an XMLObject class which has the arrays, and holds the xml elements in one array, the data values in another, and the ending elements in the third. I've also got an XMLParse object that does the actual parsing, and stores the strings to their various arrays as it finds them. I've been forbiddin from using .net's xml stuff for this assignment, has to be a byte by byte read in.
Now I was reading on MSDN about indexers, and as I understand it, I can either only have one array using an indexer(since that's the only way for properties to receive parameters), or I have to make my arrays public so that the parse class can add to them, and main or another class can read from them.
Do I have that right or am I missing something/not understanding how to get and set arrays of one class from another?
Does the same go for list as well?
If I understand your question you want to have many indexers on your class which will return the various elements from the arrays that the class holds.
you can have many indexers, but only if the type used by each indexer is different. So you could have an indexer by int and an indexer by string, but not 2 indexers by int.
From the sounds of things you won't be able to use indexers to access all the values your class holds, as they will probably all want to use int.
Publicly exposing the arrays is one option, but you could also provide different methods for reading each thing, so you could have GetXmlElement(int index), GetDataValue(int index) and GetEndingElement(int index) to provide access to the contents of the arrays.
Another option would be to store the data in arrays internally, but accept and return a class which bundled up all the data together. This way you could have a single indexer which returned all the data, as all the data would be a single object with the element, data value and end element in it.
you would have to provide similar methods for adding data and potentially removing and changing as well. Whether you want to do this, opr just exopse the underlying arrays/lists depends on if you want to be able to exercise control over the adding/accessing/deleting/changing of the arrays or not.
I suggest you use List<T> instead of array for storing the data. It is much easier to work with.
See here.
I was reading XNA library code and inside the type VertexPositionColor, they supress the CA2105:ArrayFieldsShouldNotBeReadOnly message with the justification "The performance cost of cloning the array each time it is used is too great."
public struct VertexPositionColor
{
public static readonly VertexElement [ ] VertexElements;
}
But why would it be copied when it's used? This only happens for structs where the accessed property/field is a ValueType, right?
I guess they are justifying the fact that they are exposing an array field more than anything else and the underlying reason of why they are doing so is performance:
The alternative they probably had in mind was making the array field private with a property exposing an IEnumerable or returning a copy of the array each time the property was accesed.
EDIT. Edited the answer a little to make clearer what I was trying to say :p.
In most cases they'd be better off using Array.AsReadOnly and returning a generic ReadOnlyCollection. According to the documentation that's an O(1) operation.
In the current implementation callers can change the values in the array (modifying the static/global state directly).
One more reason to read Framework Design Guidelines - it gives you the reasons behind FxCop's recommendations.
As I understand, using structure value types will always give better performance than using reference types in an array or list. Is there any downside involved in using struct instead of class type in a generic list?
PS : I am aware that MSDN recommends that struct should be maximum 16 bytes, but I have been using 100+ byte structure without problems so far. Also, when I get the maximum stack memory error exceeded for using a struct, I also run out of heap space if I use a class instead.
There is a lot of misinformation out there about struct vs. reference types in .Net. Anything which makes blanket statements like "structs will always perform better in ..." is almost certainly wrong. It's almost impossible to make blanket statements about performance.
Here are several items related to value types in a generic collection which will / can affect performance.
Using a value types in a generic instantiation can cause extra copies of methods to be JIT'd at runtime. For reference types only one instance will be generated
Using value types will affect the size of the allocated array to be count * size of the specific value type vs. reference types which have all have the same size
Adding / accessing values in the collection will incur copy overhead. The performance of this changes based on the size of the item. For references again it's the same no matter the type and for value types it will vary based on the size
As others have pointed out, there are many downsides to using large structures in a list. Some ramifications of what others have said:
Say you're sorting a list whose members are 100+ byte structures. Every time items have to be swapped, the following occurs:
var temp = list[i];
list[i] = list[j];
list[j] = temp;
The amount of data copied is 3*sizeof(your_struct). If you're sorting a list that's made up of reference types, the amount of data copied is 3*sizeof(IntPtr): 12 bytes in the 32-bit runtime, or 24 bytes in the 64-bit runtime. I can tell you from experience that copying large structures is far more expensive than the indirection inherent in using reference types.
Using structures also reduces the maximum number of items you can have in a list. In .NET, the maximum size of any single data structure is 2 gigabytes (minus a little bit). A list of structures has a maximum capacity of 2^31/sizeof(your_struct). So if your structure is 100 bytes in size, you can have at most about 21.5 million of them in a list. But if you use reference types, your maximum is about 536 million in the 32-bit runtime (although you'll run out of memory before you reach that limit), or 268 million in the 64-bit runtime. And, yes, some of us really do work with that many things in memory.
using structure value types will always give better performance than using reference types in an array or list
There is nothing true in that statement.
Take a look at this question and answer.
With structs, you cannot have code reuse in the form of class inheritance. A struct can only implement interfaces but cannot inherit from a class or another struct whereas a class can inherit from another class and of course implement interfaces.
When storing data in a List<T> or other collection (as opposed to keeping a list of controls or other active objects) and one wishes to allow the data to change, one should generally follow one of four patterns:
Store immutable objects in the list, and allow the list itself to change
Store mutable objects in the list, but only allow objects created by the owner of the list to be stored therein. Allow outsiders to access the mutable objects themselves.
Only store mutable objects to which no outside references exist, and don't expose to the outside world any references to objects within the list; if information from the list is requested, copy it from the objects in the list.
Store value types in the list.
Approach #1 is the simplest, if the objects one wants to store are immutable. Of course, the requirement that objects be immutable can be somewhat limiting.
Approach #2 can be convenient in some cases, and it permits convenient updating of data in the list (e.g. MyList[index].SomeProperty += 5;) but the exact semantics of how returned properties are, or remain, attached to items in the list may sometimes be unclear. Further, there's no clear way to load all the properties of an item in the list from an 'example' object.
Approach #3 has simple-to-understand semantics (changing an object after giving it to the list will have no effect, objects retrieved from the list will not be affected by subsequent changes to the list, and changes to objects retrieved from a list will not affect the list themselves unless the objects are explicitly written back), but requires defensive copying on every list access, which can be rather bothersome.
Approach #4 offers essentially the same semantics as approach #3, but copying a struct is cheaper than making a defensive copy of a class object. Note that if the struct is mutable, the semantics of:
var temp = MyList[index];
temp.SomeField += 5;
MyList[index] temp;
are clearer than anything that can be achieved with so-called "immutable" (i.e. mutation-only-by-assignment) structs. To know what the above does, all one needs to know about the struct is that SomeField is a public field of some particular type. By contrast, even something like:
var temp = MyList[index];
temp = temp.WithSomeField(temp.SomeField + 5);
MyList[index] temp;
which is about the best one could hope for with such a struct, would be much harder to read than the easily-mutable-struct version. Further, to be sure of what the above actually does, one would have to examine the definition of the struct's WithSomeField method and any constructors or methods employed thereby, as well as all of the struct's fields, to determine whether it had any side-effects other than modifying SomeField.