Ok, as I understand it, immutable types are inherently thread safe or so I've read in various places and I think I understand why it is so. If the inner state of an instance can not be modified once the object is created there seems to be no problems with concurrent access to the instance itself.
Therefore, I could create the following List:
class ImmutableList<T>: IEnumerable<T>
{
readonly List<T> innerList;
public ImmutableList(IEnumerable<T> collection)
{
this.innerList = new List<T>(collection);
}
public ImmutableList()
{
this.innerList = new List<T>();
}
public ImmutableList<T> Add(T item)
{
var list = new ImmutableList<T>(this.innerList);
list.innerList.Add(item);
return list;
}
public ImmutableList<T> Remove(T item)
{
var list = new ImmutableList<T>(this.innerList);
list.innerList.Remove(item);
return list;
} //and so on with relevant List methods...
public T this[int index]
{
get
{
return this.innerList[index];
}
}
public IEnumerator<T> GetEnumerator()
{
return innerList.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.IEnumerable)this.innerList).GetEnumerator();
}
}
So the question is: Is this really an immutable type? Is it really thread safe?
Obviously the type itself is immutable but there is absolutely no garantee that T is and therefore you could have concurrent access and threading issues related directly with the generic type. Would that mean that ImmutableList should be considered mutable?.
Should class ImmutableList<T>: IEnumerable<T> where T: struct be the only type truly considered immutable?
Thanks for any input on this issue.
UPDATE: A lot of answers/comments are concentrating on the particular implementation of ImmutableList I've posted which is probably not a very good example. But the issue of the question is not the implementation. The question I'm asking is if ImmutableList<MutableT> is really an immutable type considering everything that an immutable type entails.
If the inner state of an instance can not be modified once the object is created there seems to be no problems with concurrent access to the instance itself.
That is generally the case, yes.
Is this really an immutable type?
To briefly sum up: you have a copy-on-write wrapper around a mutable list. Adding a new member to an immutable list does not mutate the list; instead it makes a copy of the underlying mutable list, adds to the copy, and returns a wrapper around the copy.
Provided that the underlying list object you are wrapping does not mutate its internal state when it is read from, you have met your original definition of "immutable", so, yes.
I note that this is not a very efficient way to implement an immutable list. You'd likely do better with an immutable balanced binary tree, for example. Your sketch is O(n) in both time and memory every time you make a new list; you can improve that to O(log n) without too much difficulty.
Is it really thread safe?
Provided that the underlying mutable list is threadsafe for multiple readers, yes.
This might be of interest to you:
http://blogs.msdn.com/b/ericlippert/archive/2011/05/23/read-only-and-threadsafe-are-different.aspx
Obviously the type itself is immutable but there is absolutely no garantee that T is and therefore you could have concurrent access and threading issues related directly with the generic type. Would that mean that ImmutableList<T> should be considered mutable?.
That's a philosophical question, not a technical one. If you have an immutable list of people's names, and the list never changes, but one of the people dies, was the list of names "mutable"? I would think not.
A list is immutable if any question about the list always has the same answer. In our list of people's names, "how many names are on the list?" is a question about the list. "How many of those people are alive?" is not a question about the list, it is a question about the people referred to by the list. The answer to that question changes over time; the answer to the first question does not.
Should class ImmutableList<T>: IEnumerable<T> where T: struct be the only type truely considered immutable?
I'm not following you. How does restricting T to be a struct change anything? OK, T is restricted to struct. I make an immutable struct:
struct S
{
public int[] MutableArray { get; private set; }
...
}
And now I make an ImmutableList<S>. What stops me from modifying the mutable array stored in instances of S? Just because the list is immutable and the struct is immutable doesn't make the array immutable.
Immutability is sometimes defined in different ways. So is thread-safety.
In creating a immutable list whose purpose is to be immutable, you should document just what guarantees you are making. E.g. in this case you guarantee that the list itself is immutable and does not have any hidden mutability (some apparently immutable objects are actually mutable behind the scenes, with e.g. memoisation or internal re-sorting as an optimisation) which removes the thread-safety that comes from immutability (though one can also have such internal mutations performed in a manner that guarantees thread-safety in a different way). You are not guaranteeing that the objects stored can be used in a thread-safe manner.
The thread-safety that you should document relates to this. You can not guarantee that another object won't have the same object (you could if you were creating new objects on each call). You can guarantee that operations will not corrupt the list itself.
Insisting upon T : struct could help, as it would mean that you could ensure that each time you return an item, it's a new copy of the struct (T : struct alone wouldn't do that, as you could have operations that didn't mutate the list, but did mutate its members, so obviously you have to also do this).
This though limits you in both not supporting immutable reference types (e.g. string which tends to be a member of collections in lots of real-world cases) and doesn't allow a user to make use of it and provide their own means of ensuring that the mutability of the contained items doesn't cause problems. Since no thread-safe object can guarantee that all the code it is used in is thread-safe, there's little point tryint to ensure that (help as much as you can by all means, but don't try to ensure what you can't ensure).
It also doesn't protect mutable members of immutable structs in your immutable list!
Using your code, let's say i do this:
ImmutableList<int> mylist = new ImmutableList<int>();
mylist.Add(1);
... your code, posted on StackOverflow, causes a StackOverflow-Exception. There are quite a few sensible ways to create thread save collection, copying collections (at least trying to) and calling them immutable, a lot, doesn't quite do the trick.
Eric Lippert posted a link that might be very worth reading.
A prime example of a data type that behaves as an immutable list of mutable objects: MulticastDelegate. A MulticastDelegate may be modeled pretty accurately as an immutable list of (object, method) pairs. The set of methods, and the identities of the objects upon which they act are immutable, but in the vast majority of cases the objects themselves will be mutable. Indeed, in many if not most cases, the very purpose of the delegate will be to mutate the objects to which it holds references.
It is not the responsibility of a delegate to know whether the methods it's going to invoke upon its target objects might mutate them in thread-unsafe fashion. The delegate is responsible merely for ensuring that its lists of functions and object identities are immutable, and I don't think anyone would expect it to likewise.
An ImmutableList<T> should likewise always hold the same set of instances of type T. The properties of those instances might change, but not their identity. If a List<Car> was created holding two Fords, serial #1234 and #4422, both of which happened to be red, and one of the Fords was painted blue, what started out as a list of two red cars would have changed to a list holding a blue car and a red car, but it would still hold #1234 and #4422.
I would say that generic list is not immutable if the element is mutable, because it does not represent the full snapshot of the data state.
To achieve immutability in your example you would have to create deep copies of list elements, which is not efficient to do every time.
You can see my solution to this problem at IOG library
I believe that one thing adding to the rather lengthy discussion on this topic is that immutability/mutability should be considered along with scope.
As an example:
Say that I am working with a C# project and I define a static class with static data structures, and I define no way to modify those structures in my project. It is in effect a read-only cache of data that I can use, and for the purposes of my program, it is immutable from one run of my program to the next. I have total control over the data, and I/the user am/is unable to modify it at run time.
Now I modify my data in my source code and re-run the program. The data has changed, so in the highest sense of the word, the data is no longer immutable. From the highest level perspective, the data is actually mutable.
I would therefore posit that what we need to be discussing is not the black and white question blanketing something as immutable or not, but rather we should consider the degree of mutability of a particular implementation (as there are few things that actually never change, and are therefore truly immutable).
Related
I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic list and when they are, I'll loop them and check value a and maybe update value b.
Is it faster/better to create these objects as class or as struct?
EDIT
a. The properties are value types (except the string i think?)
b. They might (we're not sure yet) have a validate method
EDIT 2
I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?
Is it faster to create these objects as class or as struct?
You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.
Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.
Is it better to create these objects as class or as struct?
Maybe class, maybe struct. As a rule of thumb:
If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.
If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.
Are objects on the heap and the stack processed equally by the garbage collector?
No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)
But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.
Is that clear?
Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.
Example:
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i].id = i;
list[i].isValid = true;
}
is about 2 to 3 times faster than
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i] = new Value(i, true);
}
where Value is a struct with two fields (id and isValid).
struct Value
{
int id;
bool isValid;
public Value(int i, bool isValid)
{
this.i = i;
this.isValid = isValid;
}
}
On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.
Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.
In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.
(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)
Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.
When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.
More here:
http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx
If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.
As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.
Edit:
I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.
There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.
Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.
The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.
A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.
I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.
For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.
A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:
class ExposedHolder<T>
{
public T Value;
ExposedHolder() { }
ExposedHolder(T val) { Value = T; }
}
If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.
One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.
Use classes.
On a general note. Why not update value b as you create them?
From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.
Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.
That's re: performance.
I'm having a hard time understanding when to use Object (boxing/unboxing) vs when to use generics.
For example:
public class Stack
{
int position;
object[] data = new object[10];
public void Push (object o) { data[position++] = o; }
public object Pop() { return data[--position]; }
}
VS.
public class Stack<T>
{
int position;
T[] data = new T[100];
public void Push(T obj) {data[position++] = obj; }
public T Pop() { return data[--position]; }
}
Which one should I use and under what conditions? It seems like with the System.Object way I can have objects of all sorts of types currently living within my Stack. So wouldn't this be always preferable? Thanks!
Always use generics! Using object's results in cast operations and boxing/unboxing of value-types. Because of these reasons generics are faster and more elegant (no casting). And - the main reason - you won't get InvalidCastExceptions using generics.
So, generics are faster and errors are visible at compile-time. System.Object means runtime exceptions and casting which in general results in lower performance (sometimes MUCH lower).
A lot of people have recommended using generics, but it looks like they all miss the point. It's often not about the performance hit related to boxing primitive types or casting, it's about getting the compiler to work for you.
If I have a list of strings, I want the compiler to prove to me that it will always contain a list of strings. Generics does just that - I specify the intent, and the compiler proves it for me.
Ideally, I would prefer an even richer type system where you could say for example that a type (even if it was a reference type) could not contain null values, but C# does unfortunately not currently offer that.
While there are times when you will want to use a non-generic collection (think caching, for instance), you almost always have collections of homogenous objects not heterogenous objects. For a homogenous collection, even if it is a collection of variants of base type or interface, it's always better to use generics. This will save you from having to cast the result as the real type before you can use it. Using generics makes your code more efficient and readable because you can omit the code to do the cast.
It all depends on what you need in the long run.
Unlike most answers here, I won't say "always use generics" because sometimes you do need to mix cats with cucumbers.
By all means, try to stick with generics for all the reasons already given in the other answers, for example if you need to combine cats and dogs create base class Mammal and have Stack<Mamal>.
But when you really need to support every possible type, don't be afraid to use objects, they don't bite unless you're mistreating them. :)
With the object type, as you say you need to perform boxing and unboxing, which gets tedious very quickly. With generics, there's no need for that.
Also, I'd rather be more specific as to what kind of objects a class can work with and generics provides a great basis for that. Why mix unrelated data types in the first place? Your particular example of a stack emphasizes the benefit of generics over the basic object data type.
// This stack should only contain integers and not strings or floats or bools
Stack<int> intStack = new Stack<int>();
intStack.Push(1);
Remember that with generics you can specify interfaces so your class can interact with objects of many different classes, provided they all implement the same interface.
Use generics when you want your structure to handle a single type. For example, if you wanted a collection of strings you would want to instantiate a strongly typed List of strings like so:
List<string> myStrings = new List<string>();
If you want it to handle multiple types you can do without generics but you will incur a small performance hit for boxing/unboxing operations.
Generics are always preferred if possible.
Aside from performance, Generics allow you to make guarantees about the types of objects that you're working with.
The main reason this is preferred to casting is that the compiler knows what type the object is, and so it can give you compile errors that you find right away instead of runtime errors that might only happen under certain scenarios that you didn't test.
Generics are not golden hammer. In cases where your activity naturally is non-generic, use good old object. One such case - caching. Cache naturally can hold different types. I've recently seen this implementation of cache wrapper
void AddCacheItem<T>(string key, T item, int duration, ICacheItemExpiration expiration)
{
. . . . . . .
CacheManager.Add(cacheKey, item, .....
}
Question: what for, if CacheManager takes object?
Then there was real havoc in Get
public virtual T GetCacheItem<T>(string cacheKey)
{
return (T)CacheManager.GetData(cacheKey); // <-- problem code
}
The problem above is that value type will crash.
I mended the method by adding this
public T GetCacheItem<T>(string cacheKey) where T : class
Because I like idea of doing this
var x = GetCacheItem<Person>("X")?
string name = x?.FullName;
But I added new method, which will allow to take value types as well
public object GetCacheItem(string cacheKey)
The bottom line, there is usage for object, especially when storing different types in collection. Or when you have compositions where completely arbitrary and unrelated objects can exist when you need to consume them based on type.
We have seen lots of discussion in SO regarding the class vs struct in c#. Mostly ended with conclusions saying its a heap/stack memory allocation. And recommending to use structs in small data structures.
Now I have a situation to decide the simple data store among these two choices. Currenlty in our application we have thousands of classes, just acts as simple data stores (only exposed public fields) and they passed among different modules and services.
As per my understanding, I felt it's better to move ahead with struct instead classes for the performance reasons. Because these are simple data structures only act as data stores.
Before proceeding with this, I need some expert advice from the people who have experienced this struggle.
is my understanding correct?
I have seen most ORMs have classes as data stores. So I doubt there should a reason to go ahead with classes instead structs. what would that be?
I would make the choice based on the following criteria
reference type vs value type semantics. If 2 objects are only equal if they are the same object, it indicates reference type semantics => class. If the value of its members defines equality (e.g. 2 DateTimes are equal if both represent the same point in time even if they are 2 distinct objects), value type semantics => struct
Memory footprint of the object. If the object is huge and frequently allocated, making it a struct would consume the stack much faster, hence I'd rather have it as a class. On the contrary, I'd rather avoid the GC penalty for small value types; hence make them a struct.
can you make the object immutable? I find structs great for 'value objects' - from the DDD book.
Would you face some boxing-unboxing penalty based on the usage of this object? If yes, go for class.
A pretty cool, not so well known advantage of Structs over Classes is that there is an automatic implementation of GetHashcode and Equals in structs.
That's pretty useful when keys are required for dictionaries
The struct implementation of GetHashcode and Equals is based on the binary content of the struct instances + reflection for the reference members (like String members and other instances of classes)
So the following code works for GethashCode/Equals :
public struct Person
{
public DateTime Birthday { get; set; }
public int Age{ get; set; }
public String Firstname { get; set; }
}
class Program
{
static void Main(string[] args)
{
Person p1 = new Person { Age = 44, Birthday = new DateTime(1971, 5, 24), Firstname = "Emmanuel" };
Person p2 = new Person { Age = 44, Birthday = new DateTime(1971, 5, 24), Firstname = "Emmanuel" };
Debug.Assert(p1.Equals(p2));
Debug.Assert(p1.GetHashCode() == p2.GetHashCode());
}
}
Both assertions succeed when Person is a struct
Both assertions fail if Person is a class instead of a struct
Reference :
https://msdn.microsoft.com/en-Us/library/2dts52z7%28v=vs.110%29.aspx
Regards, best coding
structs should be defined immutable where in classes should not. If you think your objects are going to be small and immutable you can go ahead with making them structs or else let them be classes.
I can never really seem to remember, exactly how structs are different, but they are. In subtle ways. In fact, sometimes they come and bite you.
So. Unless you know what you are doing, just stick to classes.
I know this sounds a little newbie. I know I should right now go and look up the differences and display them here - but that has already been done by others. All I'm saying is that adding a different type of objects creates a semantical burden, a bit of extra complexity that you are wise to consider carefully.
If I remember correctly, one of the biggest problem is the value semantics of structs: Passing them around will result in different objects (as they get passed by value). If you then change some field in one place, beware that in all other places the field did not get changed! That is why everyone is recommending immutability for structs!
EDIT: For the case you are describing, structs won't work!
A class object has the advantage that it's possible to pass around a reference to it, with the scope and lifetime of such a reference being unlimited if it reaches outside code. A struct has the advantage that while it's possible to pass around short-lived references to them, it's not possible to pass around perpetual promiscuous references. This helps avoid having to worry about whether such references exist.
Some people have suggested that data holders which are mutable should not be structs. I emphatically disagree. Entities which exists for the purpose of holding data should, in many cases, be structs, especially if they are mutable. Eric Lippert has posted many times that he considers mutable value types evil (search under tags "mutable" and "struct"). It is certainly true that .net allows certain things to be done with mutable structs which it shouldn't, and doesn't conveniently allow some things that it should, but POD ("Plain Old Data") structs which have no mutating methods, but instead expose their entire state via public fields, have a very useful consistency in their behavior which is not shared with any other data type. Using a POD struct may confuse someone who isn't familiar with how they work, but will make the program much more readable by anyone who does.
Consider, for example, the following code, assuming EmployeeInfoStruct contains nothing but value types and immutable class types like String:
[employeeInfoStruct is a struct containing the following field]
public Decimal YearlyBonus;
[someEmployeeContainer is an instance of a class which includes the following method]
EmployeeInfoStruct GetEmployeeInfo(String id); // Just the signature--code is immaterial
[some other method uses the following code]
EmployeeInfoStruct anEmployee = someEmployeeContainer.GetEmployeeInfo("123-45-6789");
anEmployee.YearlyBonus += 100;
Eric Lippert complains that the above code will alter the value in anEmployee, but that change won't have any effect on the container. I would suggest that's a good thing--anyone who knows how structs work could look at the above code and know writes to a struct variable will affect that variable, but won't affect anything else unless the program later uses some other method (perhaps SetEmployeeInfo) to store that variable someplace.
Now replace EmployeeInfoStruct with EmployeeInfoClass, which has a read/write property of type YearlyBonus. Using just the information above, what can one say about the the relationship between writes to someEmployeeContainer and anEmployee? Depending upon the implementations of anEmployee's class (which, unless EmployeeInfoClass is sealed, might or might not actually be EmployeeInfoClass) and someEmployeeContainer, the relationship between the objects could be anything. Writes to one might:
Have no effect on the other
Update the other in 'natural' fashion
Corrupt the other in some arbitrary way
With structs containing nothing but fields of either value types or immutable classes, the semantics are always going to be #1. One doesn't have to look at the code for the struct itself, nor the code of the container, to know that. By contrast, if the anEmployee.Salary or someEmployeeContainer.GetEmployee is virtual, it's impossible to really know what the semantics will be.
It's important to note that, if structs are large, passing them by value or returning them from functions can be expensive. It's generally better to pass large structs as ref parameters when possible. Although the built-in collections really don't do a good job of facilitating such usage, it can make using a hundreds-of-bytes struct cheaper than using a class.
The comment about structs being immutable is correct. And this is where it can bite you. You can define structs with field setters, but when you change a field value a new instance is created. So if you hold a reference to the old object it will still reference the old value. I don't like using mutable stucts for this reason as this can produce subtle and complex bugs (especially if you use complex compound statements).
On the other hand, there are lots of good reasons for using classes with immutable state also (think string).
I remember one advice given on MSDN that struct should not be larget than 16 or 21 bytes. Looking for the link, but can't find it yet.
The main implication was that once you have a string in your data type - make it a class without thinking. Otherwise the struct shouldn't hold much.
I think you have the right idea. Structs are made to mimic data-types. They are value driven not reference based. If you look at the MSDN documentation for most of the base data classes (int, double, decimal, ect.) they are all based on structs. That being said however, structs should not be overused for that very same reason. Room to store all everything in that struct is allocated as soon as it is instantiated, where as classes just allocate room for a reference to everything inside. If the data is in small enough chunks where this is not a problem than structs are the way to go. If this is an issue go with classes. If you don't know than it might just be best to stick with what you are familiar with.
If you have low latency requirements and A LOT of objects slow garbage collections can be a problem. In that case struct can be very helpful because the garbage collector does not need to scan through a hierarchy of value types if the value types does not contain any reference types.
You can find a benchmark here: http://00sharp.wordpress.com/2013/07/03/a-case-for-the-struct/
I am wondering how immutability is defined? If the values aren't exposed as public, so can't be modified, then it's enough?
Can the values be modified inside the type, not by the customer of the type?
Or can one only set them inside a constructor? If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?
How can I guarantee that the type is 100% immutable?
If the values aren't exposed as public, so can't be modified, then it's enough?
No, because you need read access.
Can the values be modified inside the type, not by the customer of the type?
No, because that's still mutation.
Or can one only set them inside a constructor?
Ding ding ding! With the additional point that immutable types often have methods that construct and return new instances, and also often have extra constructors marked internal specifically for use by those methods.
How can I guarantee that the type is 100% immutable?
In .Net it's tricky to get a guarantee like this, because you can use reflection to modify (mutate) private members.
The previous posters have already stated that you should assign values to your fields in the constructor and then keep your hands off them. But that is sometimes easier said than done. Let's say that your immutable object exposes a property of the type List<string>. Is that list allowed to change? And if not, how will you control it?
Eric Lippert has written a series of posts in his blog about immutability in C# that you might find interesting: you find the first part here.
One thing that I think might be missed in all these answers is that I think that an object can be considered immutable even if its internal state changes - as long as those internal changes are not visible to the 'client' code.
For example, the System.String class is immutable, but I think it would be permitted to cache the hash code for an instance so the hash is only calculated on the first call to GetHashCode(). Note that as far as I know, the System.String class does not do this, but I think it could and still be considered immutable. Of course any of these changes would have to be handled in a thread-safe manner (in keeping with the non-observable aspect of the changes).
To be honest though, I can't think of many reasons one might want or need this type of 'invisible mutability'.
Here is the definition of immutability from Wikipedia (link)
"In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created."
Essentially, once the object is created, none of its properties can be changed. An example is the String class. Once a String object is created it cannot be changed. Any operation done to it actually creates a new String object.
Lots of questions there. I'll try to answer each of them individually:
"I am wondering how immutability is defined?" - Straight from the Wikipedia page (and a perfectly accurate/concise definition)
An immutable object is an object whose state cannot be modified after it is created
"If the values aren't exposed as public, so can't be modified, then it's enough?" - Not quite. It can't be modified in any way whatsoever, so you've got to insure that methods/functions don't change the state of the object, and if performing operations, always return a new instance.
"Can the values be modified inside the type, not by the customer of the type?" - Technically, it can't be modified either inside or by a consumer of the type. In pratice, types such as System.String (a reference type for the matter) exist that can be considered mutable for almost all practical purposes, though not in theory.
"Or can one only set them inside a constructor?" - Yes, in theory that's the only place where state (variables) can be set.
"If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?" - Yes, that's still perfectly fine, because it's all part of the initialisation (creation) process, and the instance isn't returned until it has finished.
"How can I guarantee that the type is 100% immutable?" - The following conditions should insure that. (Someone please point out if I'm missing one.)
Don't expose any variables. They should all be kept private (not even protected is acceptable, since derived classes can then modify state).
Don't allow any instance methods to modify state (variables). This should only be done in the constructor, while methods should create new instances using a particular constructor if they require to return a "modified" object.
All members that are exposed (as read-only) or objects returned by methods must themselves be immutable.
Note: you can't insure the immutability of derived types, since they can define new variables. This is a reason for marking any type you wan't to make sure it immutable as sealed so that no derived class can be considered to be of your base immutable type anywhere in code.
Hope that helps.
I've learned that immutability is when you set everything in the constructor and cannot modify it later on during the lifetime of the object.
The definition of immutability can be located on Google .
Example:
immutable - literally, not able to change.
www.filosofia.net/materiales/rec/glosaen.htm
In terms of immutable data structures, the typical definition is write-once-read-many, in other words, as you say, once created, it cannot be changed.
There are some cases which are slightly in the gray area. For instance, .NET strings are considered immutable, because they can't change, however, StringBuilder internally modifies a String object.
An immutable is essentially a class that forces itself to be final from within its own code. Once it is there, nothing can be changed. In my knowledge, things are set in the constructor and then that's it. I don't see how something could be immutable otherwise.
There's unfortunately no immutable keywords in c#/vb.net, though it has been debated, but if there's no autoproperties and all fields are declared with the readonly (readonly fields can only bet assigned in the constructor) modfier and that all fields is declared of an immutable type you will have assured your self immutability.
An immutable object is one whose observable state can never be changed by any plausible sequence of code execution. An immutable type is one which guarantees that any instances exposed to the outside world will be immutable (this requirement is often stated as requiring that the object's state may only be set in its constructor; this isn't strictly necessary in the case of objects with private constructors, nor is it sufficient in the case of objects which call outside methods on themselves during construction).
A point which other answers have neglected, however, is a definition of an object's state. If Foo is a class, the state of a List<Foo> consists of the sequence of object identities contained therein. If the only reference to a particular List<Foo> instance is held by code which will neither cause that sequence to be changed, nor expose it to code that might do so, then that instance will be immutable, regardless of whether the Foo objects referred to therein are mutable or immutable.
To use an analogy, if one has a list of automobile VINs (Vehicle Identification Numbers) printed on tamper-evident paper, the list itself would be immutable even though cars aren't. Even if the list contains ten red cars today, it might contain ten blue cars tomorrow; they would still, however, be the same ten cars.
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)