When to use each of T[], List<T>, IEnumerable<T>? - c#

I usually find myself doing something like:
string[] things = arrayReturningMethod();
int index = things.ToList<string>.FindIndex((s) => s.Equals("FOO"));
//do something with index
return things.Distinct(); //which returns an IEnumerable<string>
and I find all this mixup of types/interface a bit confusing and it tickles my potential performance problem antennae (which I ignore until proven right, of course).
Is this idiomatic and proper C# or is there a better alternative to avoid casting back and forth to access the proper methods to work with the data?
EDIT:
The question is actually twofold:
When is it proper to use either the IEnumerable interface or an array or a list (or any other IEnumerable implementing type) directly (when accepting parameters)?
Should you freely move between IEnumerables (implementation unknown) and lists and IEnumerables and arrays and arrays and Lists or is that non idiomatic (there are better ways to do it)/ non performant (not typically relevant, but might be in some cases) / just plain ugly (unmaintable, unreadable)?

In regards to performance...
Converting from List to T[] involves copying all the data from the original list to a newly allocated array.
Converting from T[] to List also involves copying all the data from the original list to a newly allocated List.
Converting from either List or T[] to IEnumerable involves casting, which is a few CPU cycles.
Converting from IEnumerable to List involves upcasting, which is also a few CPU cycles.
Converting from IEnumerable to T[] also involves upcasting.
You can't cast an IEnumerable to T[] or List unless it was a T[] or List respectively to begin with. You can use the ToArray or ToList functions, but those will also result in a copy being made.
Accessing all the values in order from start to end in a T[] will, in a straightforward loop, be optimized to use straightforward pointer arithmetic -- which makes it the fastest of them all.
Accessing all the values in order from start to end in a List involves a check on each iteration to make sure that you aren't accessing a value outside the array's bounds, and then the actual accessing of the array value.
Accessing all the values in an IEnumerable involves creating an enumerator object, calling the Next() function which increases the index pointer, and then calling the Current property which gives you the actual value and sticks it in the variable that you specified in your foreach statement. Generally, this isn't as bad as it sounds.
Accessing an arbitrary value in an IEnumerable involves starting at the beginning and calling Next() as many times as you need to get to that value. Generally, this is as bad as it sounds.
In regards to idioms...
In general, IEnumerable is useful for public properties, function parameters, and often for return values -- and only if you know that you're going to be using the values sequentially.
For instance, if you had a function PrintValues, if it was written as PrintValues(List<T> values), it would only be able to deal with List values, so the user would first have to convert, if for instance they were using a T[]. Likewise with if the function was PrintValues(T[] values). But if it was PrintValues(IEnumerable<T> values), it would be able to deal with Lists, T[]s, stacks, hashtables, dictionaries, strings, sets, etc -- any collection that implements IEnumerable, which is practically every collection.
In regards to internal use...
Use a List only if you're not sure how many items will need to be in it.
Use a T[] if you know how many items will need to be in it, but need to access the values in an arbitrary order.
Stick with the IEnumerable if that's what you've been given and you just need to use it sequentially. Many functions will return IEnumerables. If you do need to access values from an IEnumerable in an arbitrary order, use ToArray().
Also, note that casting is different from using ToArray() or ToList() -- the latter involves copying the values, which is indeed a performance and memory hit if you have a lot of elements. The former simply is to say that "A dog is an animal, so like any animal, it can eat" (downcast) or "This animal happens to be a dog, so it can bark" (upcast). Likewise, All Lists and T[]s are IEnumerables, but only some IEnumerables are Lists or T[]s.

A good rule of thumb is to always use IEnumerable (when declaring your variables/method parameters/method return types/properties/etc.) unless you have a good reason not to. By far the most type-compatible with other (especially extension) methods.

Well, you've got two apples and an orange that you are comparing.
The two apples are the array and the List.
An array in C# is a C-style array that has garbage collection built in. The upside of using them it that they have very little overhead, assuming you don't need to move things around. The bad thing is that they are not as efficient when you are adding things, removing things, and otherwise changing the array around, as memory gets shuffled around.
A List is a C# style dynamic array (similar to the vector<> class in C++). There is more overhead, but they are more efficient when you need to be moving things around a lot, as they will not try to keep the memory usage contiguous.
The best comparison I could give is saying that arrays are to Lists as strings are to StringBuilders.
The orange is 'IEnumerable'. This is not a datatype, but rather it is an interface. When a class implements the IEnumerable interface, it allows that object to be used in a foreach() loop.
When you return the list (as you did in your example), you were not converting the list to an IEnumerable. A list already is an IEnumerable object.
EDIT: When to convert between the two:
It depends on the application. There is very little that can be done with an array that cannot be done with a List, so I would generally recommend the List. Probably the best thing to do is to make a design decision that you are going to use one or the other, that way you don't have to switch between the two. If you rely on an external library, abstract it away to maintain consistent usage.
Hope this clears a little bit of the fog.

Looks to me like the problem is that you haven't bothered learning how to search an array. Hint: Array.IndexOf or Array.BinarySearch depending on whether the array is sorted.
You're right that converting to a list is a bad idea: it wastes space and time and makes the code less readable. Also, blindly upcasting to IEnumerable slows matters down and also completely prevents use of certain algorithms (such as binary search).

I try to avoid rapidly jumping between data types if it can be avoided.
It must be the case that each situation similar to that you described is sufficiently different so as to prevent a dogmatic rule about transforming your types; however, it is generally good practice to select a data structure that provides as best as possible the interface you need without having to copying elements needlessly to new data structures.

When to use what?
I would suggest returning the most specific type, and taking in the most flexible type.
Like this:
public int[] DoSomething(IEnumerable<int> inputs)
{
//...
}
public List<int> DoSomethingElse(IList<int> inputs)
{
//...
}
That way you can call methods on List< T > for whatever you get back from the method in addition to treating it as an IEnumerable. On the inputs, use as flexible as possible, so you don't dictate the users of your method what kind of collection to create.

You're right to ignore the 'performance problem' antennae until you actually have a performance problem. Most performance problems come from doing too much I/O or too much locking or doing one of them wrong, and none of these apply to this question.
My general approach is:
Use T[] for 'static' or 'snapshot'-style information. Use for things where calling .Add() wouldn't make sense anyway, and you don't need the extra methods List<T> gives you.
Accept IEnumerable<T> if you don't really care what you're given and don't need a constant-time .Length/.Count.
Only return IEnumerable<T> when you're doing simple manipulations of an input IEnumerable<T> or when you specifically want to make use of the yield syntax to do your work lazily.
In all other cases, use List<T>. It's just too flexible.
Corollary to #4: don't be afraid of ToList(). ToList() is your friend. It forces the IEnumerable<T> to evaluate right then (useful for when you're stacking several where clauses). Don't go nuts with it, but feel free to call it once you've built up your full where clause before you do the foreach over it (or the like).
Of course, this is just a rough guideline. Just please try to follow the same pattern in the same codebase -- code styles that jump around make it harder for maintenance coders to get into your frame of mind.

Related

So the invention of lists made IEnumerable classes useless and waste of time?

So with the introduction of lists we can easily do, what we used to do with IEnumerable classes. So now IEnumerable classes mostly used only to understand how internally the lists work. Am i right?
No. While a list is enumerable, the reverse is not quite true.
IEnumerable as an interface states that you can iterate over a series of elements, whereas IList states that the element's order is preserved and you can randomly access them [in an efficient fashion – in .NET]. Although both of these revolve around a collection-ish data structure, the promises the interfaces make are different. That said, many types offer both and therefore implement both interfaces (List<T>, T[], ...).
Think of a data reader or a function computing a series of values (Fibonacci, prime numbers, ...), sets or trees for instance. None of there are lists, yet you can iterate over them using a method that takes an IEnumerable rather than and IList as a parameter.
Despite not strictly matching the tags, it may be worth mentioning that the interface semantics are slightly different in Java. The List interface does not specifiy random access. The Iterable interface though, which is the counterpart of IEnumerable, is similar.
I would suggest that interfaces like IEnumerable, ICollection and IList (in generic and non-generic forms, though ICollection and ICollection<T> have little relation to each other) are best thought of ways by which objects can allow themselves to be used by code which is designed to do various things with sequences of items (which may be countable or addressable by index). Code which simply wants to create a counted ordered sequence of items which is addressable by index can simply use List<T>, and can use that type whether or not it needs something which is counted and addressable by index, but an object which wants to make itself usable by code expecting a sequence of items can implement one of the above interfaces directly, rather than having to copy its contents to a new List<T> every time anybody wants enumerate them.
Note that one major advantage of IEnumerable<T> is that types which implement it do not have to hold the "entire contents" in memory, and can instead generate elements "on the fly". If one has a method that does something with all the items in an IEnumerable<T>, and one wants to perform that action on every item in a 1,000,000-item list where Fnord is greater than 23 (which is true of 95% of the items), having the method accept IEnumerable<T> will avoid any need to create a 950,000-item list containing all the items where Fnord is greater than 23.

C#, collection that holds Lists of different type

I currently have a class that holds 3 dictionaries, each of which contains Lists of the same type within each dictionary, but different types across the dictionaries, such as:
Dictionary1<string, List<int>> ...
Dictionary2<string, List<double>>...
Dictionary3<string, List<DateTime>>...
Is there a way to use a different collection that can hold all the Lists so that I can iterate through the collection of Lists? Being able to iterate is the only requirement of such collection, no sorting, no other operations will be needed.
I want to be able to access the List directly through the string or other identifier and access the List's members. Type safety is not a requirement but in exchange I do not want to have to cast anything, speed is the absolutely top priority here.
So, when calculations are performed on the list's members knowledge of the exact type is assumed, such as "double lastValue = MasterCollection["List1"].Last();", whereas it is assumed that List1 is a List of type double.
Can this be accomplished? Sorry that I may use sometimes incorrect or incomplete terminology I am not a trained programmer or developer.
Thanks,
Matt
To do that you would have to use a non-generic API, such as IList (not IList<T>) - i.e. Dictionary<string, IList>. Or since you just need to iterate, maybe just IEnumerable (not IEnumerable<T>). However! That will mean that you are talking non-generic, so some sacrifices may be necessary (boxing of value types during retrieval, etc).
With an IList/IEnumerable appraoch, to tweak your example:
double lastValue = MasterCollection["List1"].Cast<double>().Last();
You could, of course, write some custom extension methods on IDictionary<string,IList>, allowing something more like:
double lastValue = MasterCollection.Get<double>("List1").Last();
I'm not sure it is worth it, though.
No, what you are trying to do is not possible; namely, the requirement for strong-typing on all of the lists without casting is what's preventing the rest.
If your only requirement is to iterate through each of the items in the list, then you could create your dictionary as a Dictionary<string, IEnumerable> (note the non-generic interface). IEnumerable<T> derives from IEnumerable which would allow you to iterate through each item in the list.
The problem with this is that you would have to perform a cast at some point either to the IEnumerable<T> (assuming you know you are working with it) or use the Cast<T> extension method on the Enumerable class (the latter being worse, as you might incur boxing/unboxing, unless it does type-sniffing, in which case, you wouldn't have a performance penalty).
I would say that you shouldn't store the items in a single list; your example usage shows that you know the type ahead of time (you are assigning to a double) so you are aware at that point in time of the specific typed list.
It's not worth losing type-safety over.

c# array vs generic list [duplicate]

This question already has answers here:
Array versus List<T>: When to use which?
(16 answers)
Closed 9 years ago.
i basically want to know the differences or advantages in using a generic list instead of an array in the below mentioned scenario
class Employee
{
private string _empName;
public string EmpName
{
get{ return _empName; }
set{ _empName = value; }
}
}
1. Employee[] emp
2. List<Employee> emp
can anyone please tell me the advantages or disadvantages and which one to prefer?
One big difference is that List<Employee> can be expanded (you can call Add on it) or contracted (you can call Remove on it) whereas Employee[] is fixed in size. Thus, Employee[] is tougher to work with unless the need calls for it.
The biggest difference is that arrays can't be made longer or shorter once they're created. List instances, however can have elements added or removed. There are other diffs too (e.g. different sets of methods available) but add/remove is the big difference.
I like List unless there's a really good reason to use an Array, since the flexibility of List is nice and the perf penalty is very small relative to the cost of most other things your code is usually doing.
If you want to dive into a lot of interesting technical detail, check out this StackOverflow thread which delves into the List vs. Array question in more depth.
With the generic list, you can Add / Remove etc cheaply (at least, at the far end). Resizing an array (to add/remove) is more expensive. The obvious downside is that a list has spare capacity so maybe wastes a few bytes - not worth worrying about in most cases, though (and you can trim it).
Generally, prefer lists unless you know your data never changes size.
API-wise, since LINQ there is little to choose between them (i.e. the extra methods on List<T> are largely duplicated by LINQ, so arrays get them for free).
Another advantage is that with a list you don't need to expose a setter:
private readonly List<Foo> items = new List<Foo>();
public List<Foo> Items { get { return items; } }
eliminating a range of null bugs, and allowing you to keep control over the data (especially if you use a different IList<> implementation that supports inspection / validation when changing the contents).
If you are exposing a collection in a public interface the .NET Framework Guidelines advise to use a List rather than T[]. (In fact, a BindingList< T >)
Internally, an array can be more appropriate if you have a collection which is a fixed, known size. Resizing an array is expensive compared to adding an element to the end of a List.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created.
So, it uses dynamic memory allocation for the array at creation time. (This differs from static memory allocation as used for C++ arrays, where the size must be known at compile time.)
A list can grow dynamically AFTER it has been created, and it has the .Add() function to do that.
-from MSDN
Generics Vs Array Lists-SO General comparision.
Generic List vs Arrays-SO Why is generic list slower than array?
Which one to prefer? List<T>.
If you know the number of elements array is a good choice. If not use the list. Internally List<T> uses an array of T so the are actually more like than you may think.
With a List, you don't need to know the size of the array beforehand. You can dynamically add new Employee's based on the needs of your implementation.

Should updating an element in List<T> by index be quicker than using ArrayList index?

I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)

Which do you prefer for interfaces: T[], IEnumerable<T>, IList<T>, or other?

Ok, I'm hoping the community at large will aid us in solving a workplace debate that has been ongoing for a while. This has to do with defining interfaces that either accept or return lists of some type. There are several ways of doing this:
public interface Foo
{
Bar[] Bars { get; }
IEnumerable<Bar> Bars { get; }
ICollection<Bar> Bars { get; }
IList<Bar> Bars { get; }
}
My own preference is to use IEnumerable for arguments and arrays for return values:
public interface Foo
{
void Do(IEnumerable<Bar> bars);
Bar[] Bars { get; }
}
My argument for this approach is that the implementation class can create a List directly from the IEnumerable and simply return it with List.ToArray(). However some believe that IList should be returned instead of an array. The problem I have here is that now your required again to copy it with a ReadOnlyCollection before returning. The option of returning IEnumerable seems troublesome for client code?
What do you use/prefer? (especially with regards to libraries that will be used by other developers outside your organization)
My preference is IEnumerable<T>. Any other of the suggested interfaces gives the appearance of allowing the consumer to modify the underlying collection. This is almost certainly not what you want to do as it's allowing consumers to silently modify an internal collection.
Another good one IMHO, is ReadOnlyCollection<T>. It allows for all of the fun .Count and Indexer properties and unambiguously says to the consumer "you cannot modify my data".
I don't return arrays - they really are a terrible return type to use when creating an API - if you truly need a mutable sequence use the IList<T> or ICollection<T> interface or return a concrete Collection<T> instead.
Also I would suggest that you read Arrays considered somewhat harmful by Eric Lippert:
I got a moral question from an author
of programming language textbooks the
other day requesting my opinions on
whether or not beginner programmers
should be taught how to use arrays.
Rather than actually answer that
question, I gave him a long list of my
opinions about arrays, how I use
arrays, how we expect arrays to be
used in the future, and so on. This
gets a bit long, but like Pascal, I
didn't have time to make it shorter.
Let me start by saying when you
definitely should not use arrays, and
then wax more philosophical about the
future of modern programming and the
role of the array in the coming world.
For property collections that are indexed (and the indices have necessary semantic meaning), you should use ReadOnlyCollection<T> (read only) or IList<T> (read/write). It's the most flexible and expressive. For non-indexed collections, use IEnumerable<T> (read only) or ICollection<T> (read/write).
Method parameters should use IEnumerable<T> unless they 1) need to add/remove items to the collection (ICollection<T>) or 2) require indexes for necesary semantic purposes (IList<T>). If the method can benefit from indexing availability (such as a sorting routine), it can always use as IList<T> or .ToList() when that fails in the implementation.
I think about this in terms of writing the most useful code possible: code that can do more.
Put in those terms, it means I like to accept the weakest interface possible as method arguments, because that makes my code useful from more places. In this case, that's an IEnumerable<T>. Have an array? You can call my method. Have a List? You can call my method. Have an iterator block? You get the idea.
It also means I like my methods to return the strongest interface that is convenient, so that code that relies on the method can easily do more. In this case, that would be IList<T>. Note that this doesn't mean I will construct a list just so I can return it. It just means that if I already have some that implements IList<T>, I may as well use it.
Note that I'm a little unconventional with regards to return types. A more typical approach is to also return weaker types from methods to avoid locking yourself into a specific implementation.
I would prefer IEnumerable as it is the most highlevel of the interfaces giving the end user the opportunity to re-cast as he wishes. Even though this may provide the user with minimum functionality to begin with (basically only enumeration) it would still be enough to cover virtually any need, especially with the extension methods, ToArray(), ToList() etc.
IEnumerable<T> is very useful for lazy-evaluated iteration, especially in scenarios that use method chaining.
But as a return type for a typical data access tier, a Count property is often useful, and I would prefer to return an ICollection<T> with a Count property or possibly IList<T> if I think typical consumers will want to use an indexer.
This is also an indication to the caller that the collection has actually been materialized. And thus the caller can iterate through the returned collection without getting exceptions from the data access tier. This can be important. For example, a service may generate a stream (e.g. SOAP) from the returned collection. It can be awkward if an exception is thrown from the data access layer while generating the stream due to lazy-evaluated iteration, as the output stream is already partially written when the exception is thrown.
Since the Linq extension methods were added to IEnumerable<T>, I've found that my use of the other interfaces has declined considerably; probably around 80%. I used to use List<T> religiously as it had methods that accepted delegates for lazy evaluation like Find, FindAll, ForEach and the like. Since that's available through System.Linq's extensions, I've replaced all those references with IEnumerable<T> references.
I wouldn't go with array, its a type that allows modification yet doesn't have add/remove... kind of like the worst of the pack. If I want to allow modifications, then I would use a type that supports add/remove.
When you want to prevent modifications, you are already wrapping it/copying it, so I don't see what's wrong with a an IEnumerable or a ReadOnlyCollection. I would go with the later ... something I don't like about IEnumerable is that its lazy by nature, yet when you are using with pre-loaded data only to wrap it, calling code that works with it tends to assume pre-loaded data or have extra "unnecessary" lines :( ... that can get ugly results during change.

Categories

Resources