I'm trying to understand the design decision behind this part of the language. I admit i'm very new to it all but this is something which caught me out initially and I was wondering if I'm missing an obvious reason. Consider the following code:
List<int> MyList = new List<int>() { 5, 4, 3, 2, 1 };
int[] MyArray = {5,4,3,2,1};
//Sort the list
MyList.Sort();
//This was an instance method
//Sort the Array
Array.Sort(MyArray);
//This was a static method
Why are they not both implemented in the same way - intuitively to me it would make more sense if they were both instance methods?
The question is interesting because it reveals details of the .NET type system. Like value types, string and delegate types, array types get special treatment in .NET. The most notable oddish behavior is that you never explicitly declare an array type. The compiler takes care of it for you with ample helpings of the jitter. System.Array is an abstract type, you'll get dedicated array types in the process of writing code. Either by explicitly creating a type[] or by using generic classes that have an array in their base implementation.
In a largish program, having hundreds of array types is not unusual. Which is okay, but there's overhead involved for each type. It is storage required for just the type, not the objects of it. The biggest chunk of it is the so-called 'method table'. In a nutshell, it is a list of pointers to each instance method of the type. Both the class loader and the jitter work together to fill this table. This is commonly known as the 'v-table' but isn't quite a match, the table contains pointers to methods that are both non-virtual and virtual.
You can see where this leads perhaps, the designers were worried about having lots of types with big method tables. So looked for ways to cut down on the overhead.
Array.Sort() was an obvious target.
The same issue is not relevant for generic types. A big nicety of generics, one of many, one method table can handle the method pointers for any type parameter of a reference type.
You are comparing two different types of 'object containers':
MyList is a generic collection of type List, a wrapper class, of type int, where the List<T> represents a strongly typed list of objects. The List class itself provides methods to search, sort, and manipulate its contained objects.
MyArray is a basic data structure of type Array. The Array does not provide the same rich set of methods as the List. Arrays can at the same time be single-dimensional, multidimensional or jagged, whilst Lists out of the box only are single-dimensional.
Take a look at this question, it provides a richer discussion about these data types: Array versus List<T>: When to use which?
Without asking someone who was involved in the design of the original platform it's hard to know. But, here's my guess.
In older languages, like C, arrays are dumb data structures - they have no code of their own. Instead, they're manipulated by outside methods. As you move into an Object oriented framework, the closest equivilent is a dumb object (with minimal methods) manipulated by static methods.
So, my guess is that the implementation of .NET Arrays is more a symptom of C style thinking in the early days of development than anything else.
This likely has to do with inheritance. The Array class cannot be manually derived from. But oddly, you can declare an array of anything at all and get an instance of System.Array that is strongly typed, even before generics allowed you to have strongly typed collections. Array seems to be one of those magic parts of the framework.
Also notice that none of the instance methods provided on an array massively modify the array. SetValue() seems to be the only one that changes anything. The Array class itself provides many static methods that can change the content of the array, like Reverse() and Sort(). Not sure if that's significant - maybe someone here can give some background as to why that's the case.
In contrast, List<T> (which wasn't around in the 1.0 framework days) and classes like ArrayList (which was around back then) are just run-of-the mill classes with no special meaning within the framework. They provide a common .Sort() instance method so that when you inherited from these classes, you'd get that functionality or could override it.
However, these kinds of sort methods have gone out of vogue anyway as extension methods like Linq's .OrderBy() style sorting have become the next evolution. You can query and sort arrays and Lists and any other enumerable object with the same mechanism now, which is really, really nice.
-- EDIT --
The other, more cynical answer may just be - that's how Java did it so Microsoft did it the same way in the 1.0 version of the framework since at that time they were busy playing catch-up.
One reason might be because Array.Sort was designed in .NET 1.0, which had no generics.
I'm not sure, but I'm thinking maybe just so that arrays are as close to Primitives as they can be.
Related
After so many years, today I decided to look back into some complicated syntax related to arrays and didn't take much time until I realized Microsoft doesn't mention to Dynamic side of Arrays anymore and all the examples are Static. I literally needed to do a quick search to find out that they moved the concept under a new wording as ArrayList Classes or the more generic lists as List Classes
I just want to clarify this change and the fact that if I need to use the new naming to communicate with our new programmers since they might not being taught the old naming conventions in the university or online courses.
MSDN Reference
ArrayList Class (.NET Framework 4.5)
Implements the IList interface using an array whose size is
dynamically increased as required.
// Creates and initializes a new ArrayList.
ArrayList myAL = new ArrayList();
and List<T> as
// Create a list of parts.
List<Part> parts = new List<Part>();
Please read this before you go ahead. What I am asking is: If Microsoft dropped the WORDING from it's glossaries of WORDS relating to Arrays or any Objects of strongly typed lists as called "Dynamic Arrays".
Yes, and ArrayList is effectively "dropped" as well (even though it still exists). The page you linked about ArrayList states:
For a strongly-typed alternative to ArrayList, consider using List<T>.
And that's exactly what you should do. Generics (introduced in .NET 2.0) were intended to fix the problems with having a bunch of weakly-typed ArrayLists everywhere. Use List<T> (or another generic collection) when you need a dynamically sized collection of items.
While dynamic is indeed the opposite of static in english, it's true that in C# and other similar languages (like Java) the common use for these terms is changed. What you call an immutable length array is often referred as fixed while an expandable collection like an ArrayList (or better List<T>, as others pointed) is just referred as dynamically sizable. Static is now more often used to express the concept of class members while dynamic is often used to express the general concept the of runtime binding, as opposed to compile time, "static" binding. Binding could be, for example, method resolution or variable expansion.
I saw this reply from Jon on Initialize generic object with unknown type:
If you want a single collection to
contain multiple unrelated types of
values, however, you will have to use
List<object>
I'm not comparing ArrayList vs List<>, but ArrayList vs List<object>, as both will be exposing elements of type object. What would be the benefit of using either one in this case?
EDIT: It's no concern for type safety here, since both class is exposing object as its item. One still needs to cast from object to the desired type. I'm more interested in anything other than type safety.
EDIT: Thanks Marc Gravell and Sean for the answer. Sorry, I can only pick 1 as answer, so I'll up vote both.
You'll be able to use the LINQ extension methods directly with List<object>, but not with ArrayList, unless you inject a Cast<object>() / OfType<object> (thanks to IEnumerable<object> vs IEnumerable). That's worth quite a bit, even if you don't need type safety etc.
The speed will be about the same; structs will still be boxed, etc - so there isn't much else to tell them apart. Except that I tend to see ArrayList as "oops, somebody is writing legacy code again..." ;-p
One big benefit to using List<object> is that these days most code is written to use the generic classes/interfaces. I suspect that these days most people would write a method that takes a IList<object> instead of an IList. Since ArrayList doesn't implement IList<object> you wouldn't be able to use an array list in these scenarios.
I tend to think of the non-generic classes/interfaces as legacy code and avoid them whenever possible.
In this case, ArrayList vs. List<Object> then you won't notice any differences in speed. There might be some differences in the actual methods available on each of these, particular in .NET 3.5 and counting extension methods, but that has more to do with ArrayList being somewhat deprecated than anything else.
Yes, besides being typesafe, generic collections might be actually faster.
From the MSDN (http://msdn.microsoft.com/en-us/library/system.collections.generic.aspx)
The System.Collections.Generic
namespace contains interfaces and
classes that define generic
collections, which allow users to
create strongly typed collections that
provide better type safety and
performance than non-generic strongly
typed collections.
Do some benchmarking and you will know what performs best. I guestimate that the difference is very small.
List<> is a typesafe version of ArrayList. It will guarantee that you will get the same object type in the collection.
I'm looking at the new C# feature of tuples. I'm curious, what problem was the tuple designed to solve?
What have you used tuples for in your apps?
Update
Thanks for the answers thus far, let me see if I have things straight in my mind.
A good example of a tuple has been pointed out as coordinates. Does this look right?
var coords = Tuple.Create(geoLat,geoLong);
Then use the tuple like so:
var myLatlng = new google.maps.LatLng("+ coords.Item1 + ", "+ coords.Item2 + ");
Is that correct?
When writing programs it is extremely common to want to logically group together a set of values which do not have sufficient commonality to justify making a class.
Many programming languages allow you to logically group together a set of otherwise unrelated values without creating a type in only one way:
void M(int foo, string bar, double blah)
Logically this is exactly the same as a method M that takes one argument which is a 3-tuple of int, string, double. But I hope you would not actually make:
class MArguments
{
public int Foo { get; private set; }
... etc
unless MArguments had some other meaning in the business logic.
The concept of "group together a bunch of otherwise unrelated data in some structure that is more lightweight than a class" is useful in many, many places, not just for formal parameter lists of methods. It's useful when a method has two things to return, or when you want to key a dictionary off of two data rather than one, and so on.
Languages like F# which support tuple types natively provide a great deal of flexibility to their users; they are an extremely useful set of data types. The BCL team decided to work with the F# team to standardize on one tuple type for the framework so that every language could benefit from them.
However, there is at this point no language support for tuples in C#. Tuples are just another data type like any other framework class; there's nothing special about them. We are considering adding better support for tuples in hypothetical future versions of C#. If anyone has any thoughts on what sort of features involving tuples you'd like to see, I'd be happy to pass them along to the design team. Realistic scenarios are more convincing than theoretical musings.
Tuples provide an immutable implementation of a collection
Aside from the common uses of tuples:
to group common values together without having to create a class
to return multiple values from a function/method
etc...
Immutable objects are inherently thread safe:
Immutable objects can be useful in multi-threaded applications. Multiple threads can act on data represented by immutable objects without concern of the data being changed by other threads. Immutable objects are therefore considered to be more thread-safe than mutable objects.
From "Immutable Object" on wikipedia
It provides an alternative to ref or out if you have a method that needs to return multiple new objects as part of its response.
It also allows you to use a built-in type as a return type if all you need to do is mash-up two or three existing types, and you don't want to have to add a class/struct just for this combination. (Ever wish a function could return an anonymous type? This is a partial answer to that situation.)
It's often helpful to have a "pair" type, just used in quick situations (like returning two values from a method). Tuples are a central part of functional languages like F#, and C# picked them up along the way.
very useful for returning two values from a function
Personally, I find Tuples to be an iterative part of development when you're in an investigative cycle, or just "playing". Because a Tuple is generic, I tend to think of it when working with generic parameters - especially when wanting to develop a generic piece of code, and I'm starting at the code end, instead of asking myself "how would I like this call to look?".
Quite often I realise that the collection that the Tuple forms become part of a list, and staring at List> doesn't really express the intention of the list, or how it works. I often "live" with it, but find myself wanting to manipulate the list, and change a value - at which point, I don't necessarily want to create a new Tuple for that, thus I need to create my own class or struct to hold it, so I can add manipulation code.
Of course, there's always extension methods - but quite often you don't want to extend that extra code to generic implementations.
There have been times I'm wanted to express data as a Tuple, and not had Tuples available. (VS2008) in which case I've just created my own Tuple class - and I don't make it thread safe (immutable).
So I guess I'm of the opinion that Tuples are lazy programming at the expense of losing a type name that describes it's purpose. The other expense is that you have to declare the signature of the Tuple whereever it's used as a parameter. After a number of methods that begin to look bloated, you may feel as I do, that it is worth making a class, as it cleans up the method signatures.
I tend to start by having the class as a public member of the class you're already working in. But the moment it extends beyond simply a collection of values, it get's it's own file, and I move it out of the containing class.
So in retrospect, I believe I use Tuples when I don't want to go off and write a class, and just want to think about what I've writing right now. Which means the signature of the Tuple may change quite a lot in the text half an hour whilst I figure out what data I am going to need for this method, and how it's returning what ever values it will return.
If I get a chance to refactor code, then often I'll question a Tuple's place in it.
Old question since 2010, and now in 2017 Dotnet changes and become more smart.
C# 7 introduces language support for tuples, which enables semantic names for the fields of a tuple using new, more efficient tuple types.
In vs 2017 and .Net 4.7 (or installing nuget package System.ValueTuple), you can create/use a tuple in a very efficient and simple way:
var person = (Id:"123", Name:"john"); //create tuble with two items
Console.WriteLine($"{person.Id} name:{person.Name}") //access its fields
Returning more than one value from a method:
public (double sum, double average) ComputeSumAndAverage(List<double> list)
{
var sum= list.Sum();
var average = sum/list.Count;
return (sum, average);
}
How to use:
var list=new List<double>{1,2,3};
var result = ComputeSumAndAverage(list);
Console.WriteLine($"Sum={result.sum} Average={result.average}");
For more details read: https://learn.microsoft.com/en-us/dotnet/csharp/tuples
A Tuple is often used to return multiple values from functions when you don’t want to create a specific type. If you're familiar with Python, Python has had this for a long time.
Returning more than one value from a function. getCoordinates() isn't very useful if it just returns x or y or z, but making a full class and object to hold three ints also seems pretty heavyweight.
A common use might be to avoid creating classes/structs that only contains 2 fields, instead you create a Tuple (or a KeyValuePair for now).
Usefull as a return value, avoid passing N out params...
I find the KeyValuePair refreshing in C# to iterate over the key value pairs in a Dictionary.
Its really helpful while returning values from functions. We can have multiple values back and this is quite a saver in some scenarios.
I stumbled upon this performance benchmark between Tuples and Key-Value pairs and probably you will find it interesting. In summary it says that Tuple has advantage because it is a class, therefore it is stored in the heap and not in the stack and when passed around as argument its pointer is the only thing that is going. But KeyValuePair is a structs so it is faster to allocate but it is slower when used.
http://www.dotnetperls.com/tuple-keyvaluepair
Its often hear that Haskell(which I don't know) has a very interesting type system.. I'm very familiar with Java and a little with C#, and sometimes it happens that I'm fighting the type system so some design accommodates or works better in a certain way.
That led me to wonder...
What are the problems that occur somehow because of deficiencies of Java/C# type system?
How do you deal with them?
Arrays are broken.
Object[] foo = new String[1];
foo[0] = new Integer(4);
Gives you java.lang.ArrayStoreException
You deal with them with caution.
Nullability is another big issue. NullPointerExceptions jump at your face everywhere. You really can't do anything about them except switch language, or use conventions of avoiding them as much as possible (initialize fields properly, etc).
More generally, the Java's/C#'s type systems are not very expressive. The most important thing Haskell can give you is that with its types you can enforce that functions don't have side effects. Having a compile time proof that parts of programs are just expressions that are evaluated makes programs much more reliable, composable, and easier to reason about. (Ignore the fact, that implementations of Haskell give you ways to bypass that).
Compare that to Java, where calling a method can do almost anything!
Also Haskell has pattern matching, which gives you different way of creating programs; you have data on which functions operate, often recursively. In pattern matching you destruct data to see of what kind it is, and behave according to it. e.g. You have a list, which is either empty, or head and tail. If you want to calculate the length, you define a function that says: if list is empty, length = 0, otherwise length = 1 + length(tail).
If you really like to learn more, there's two excellent online sources:
Learn you a Haskell and Real World Haskell
I dislike the fact that there is a differentiation between primitive (native) types (int, boolean, double) and their corresponding class-wrappers (Integer, Boolean, Double) in Java.
This is often quite annoying especially when writing generic code. Native types can't be genericized, you must instantiate a wrapper instead. Generics should make your code more abstract and easier reusable, but in Java they bring restrictions with obviously no reasons.
private static <T> T First(T arg[]) {
return arg[0];
}
public static void main(String[] args) {
int x[] = {1, 2, 3};
Integer y[] = {3, 4, 5};
First(x); // Wrong
First(y); // Fine
}
In .NET there are no such problems even though there are separate value and reference types, because they strictly realized "everything is an object".
this question about generics shows the deficiencies of the java type system's expressiveness
Higher-kinded generics in Java
I don't like the fact that classes are not first-class objects, and you can't do fancy things such as having a static method be part of an interface.
A fundamental weakness in the Java/.net type system is that it has no declarative means of specifying how an object's state relates to the contents of its reference-type fields, nor of specifying what a method is allowed to persist reference-type parameters. Although in some sense it's nice for the runtime to be able to use a field Foo of one type ICollection<integer> to mean many different things, it's not possible for the type system to provide real support for things like immutability, equivalency testing, cloning, or any other such features without knowing whether Foo represents:
A read-only reference to a collection which nothing will ever mutate; the class may freely share such reference with outside code, without affecting its semantics. The reference encapsulates only immutable state, and likely does not encapsulate identity.
A writable reference to a collection whose type is mutable, but which nothing will ever actually mutate; the class may only share such references with code that can be trusted not to mutate it. As above, the reference encapsulates only immutable state, and likely does not encapsulate identity.
The only reference anywhere in the universe to a collection which it mutates. The reference would encapsulate mutable state, but would not encapsulate identity (replacing the collection with another holding the same items would not change the state of the enclosing object).
A reference to a collection which it mutates, and whose contents it considers to be its own, but to which outside code holds references which it expects to be attached to `Foo`'s current state. The reference would encapsulate both identity and mutable state.
A reference to a mutable collection owned by some other object, which it expects to be attached to that other object's state (e.g. if the object holding `Foo` is supposed to display the contents of some other collection). That reference would encapsulate identity, but would not encapsulate mutable state.
Suppose one wants to copy the state of the object that contains Foo to a new, detached, object. If Foo represents #1 or #2, one may store in the new object either a copy of the reference in Foo, or a reference to a new object holding the same data; copying the reference would be faster, but both operations would be correct. If Foo represents #3, a correct detached copy must hold a reference to a new detached object whose state is copied from the original. If Foo represents #5, a correct detached copy must hold a copy of the original reference--it must NOT hold reference to a new detached object. And if Foo represents #4, the state of the object containing it cannot be copied in isolation; it might be possible to copy a bunch of interconnected objects to yield a new bunch whose state is equivalent to the original, but it would not be possible to copy the state of objects individually.
While it won't be possible for a type system to specify declaratively all of the possible relationships that can exist among objects and what should be done about them, it should be possible for a type system and framework to correctly generate code to produce semantically-correct equivalence tests, cloning methods, smoothly inter-operable mutable, immutable, and "readable" types, etc. in most cases, if it knew which fields encapsulate identity, mutable state, both, or neither. Additionally, it should be possible for a framework to minimize defensive copying and wrapping in circumstances where it could ensure that the passed references would not be given to anything that would mutate them.
(Re: C# specifically.)
I would love tagged unions.
Ditto on first-class objects for classes, methods, properties, etc.
Although I've never used them, Python has type classes that basically are the types that represent classes and how they behave.
Non-nullable reference types so null-checks are not needed. It was originally considered for C# but was discarded. (There is a stack overflow question on this.)
Covariance so I can cast a List<string> to a List<object>.
This is minor, but for the current versions of Java and C# declaring objects breaks the DRY principle:
Object foo = new Object;
Int x = new Int;
None of them have meta-programming facilities like say that old darn C++ dog has.
Using "using" duplication and lack of typedef is one example that violates DRY and can even cause user-induced 'aliasing' errors and more. Java 'templates' isn't even worth mentioning..
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)