Is there any performance difference between compare method and compare class? - c#

Are there any difference in performance between
List<T>.Sort Method (Comparison<T>)
and
List<T>.Sort Method (IComparer<T>)?
Does exists any structural (software architectural) benefits?
When do you use the compare method instead of compare class and vice versa?
EDIT:
The List<T>.Sort Method (IComparer<T>) is faster. Thanks Jim Mischel!
The performance difference is around 1% on my PC.
It seems that the compare class is the faster one.

The difference is that the first accepts a method (anonymous or not) and the second accepts an instance of a comparer object. Sometimes it is easier to define complex and customizeable comparer classes rather than write everything inside a single function.
I prefer the first for simple sorting in one dimension and the latter for multidimensional sorting in e.g. data grids.
Using a comparer you can have private members which can often help with caching. This is useful in certain scenarios (again, in complex sorting of a large data set displayed in a grid).

As I recall, List.Sort(Comparer<T>) instantiates an IComparer<T> and then calls List.Sort(IComparer<T>).
It looks something like this:
class SortComparer<T>: IComparer<T>
{
private readonly Comparison<T> _compare;
public SortComparer(Comparison<T> comp)
{
_compare = comp;
}
public int Compare(T x, T y)
{
return _compare(x, y);
}
}
public Sort(Comparison<T> comp)
{
Sort(new SortComparer(comp));
}
So they really end up doing the same thing. When I timed this stuff (back in .NET 3.5), Sort(IComparer<T>) was slightly faster because it didn't have to do the extra dereference on every call. But the difference really wasn't big enough to worry about. This is definitely a case of use whatever works best in your code rather than what performs the fastest.
A little more about it, including information about default IComparer implementations: Of Comparison and IComparer

Related

Most efficient way to compare two List<T>

I'm trying to compare two lists. Here's the extension method I'm trying to use:
public static bool EqualsAll<T>(this IList<T> a, IList<T> b)
{
if (a == null || b == null)
return (a == null && b == null);
if (a.Count != b.Count)
return false;
EqualityComparer<T> comparer = EqualityComparer<T>.Default;
for (int i = 0; i < a.Count; i++)
{
if (!comparer.Equals(a[i], b[i]))
return false;
}
return true;
}
I've already asked this question here. But I need more information about that. the answerer said that It's better to use SequenceEqual instead of for loop.
Now my question is that which one of two approaches is the most efficient way for comparing two List<T>?
I'm sure they're both relatively equal in performance since both are doing a sequence equal check. SequenceEqual doesn't do any magic - it still loops.
As for which one to use, go with SequenceEqual - it's built into the framework, other programmers already know it's functionality, and most importantly, there's no need to reinvent the wheel.
As IList<T> implements IEnumerable<T>, as documented here, I would believe that
a.SequenceEqual(b)
is a reasonable way to do the comparison, with the exentsion method being documented here.
The most efficient way to compare to List<T> objects is to not go through interface dispatch by using IList. Instead, specialize the type to List<T>. The arguments should be List<T>.
This saves a lot of indirect calls. It's clearly faster than the IEnumerable based SequenceEquals while requires two indirect calls per element.
Also, you need to cache the count.
It would be best if List<T> had a built-in method, or implemented a certain interface or allowed access to its internal buffer. But none of that is available.
I guess the fastest possible way is to runtime-compile a function that returns you the internal buffers of each of those lists. Then, you can compare arrays which is much faster. Clearly, this relies on Full Trust and undocumented internals... Proceed with care.
There are also things you can do to avoid the cost of the comparer. It really depends on how important this issue is. You can go to great length to optimize this.

What's the role of IEnumerable<T> and why should I use it?

Why should I use IEnumerable<T> when I can make do with...say List<T>? What's the advantage of the former over the latter?
IEnumerable<T> is an interface that tells us that we can enumerate over a sequence of T instances. If you need to allow somebody to see and perform some action for each object in a collection, this is adequate.
List<T>, on the other hand, is a specific implementation of IEnumerable<T> that stores the objects in a specific, known manner. Internally, this may be a very good way to store your values that you expose via IEnumerable<T>, but a List<T> is not always appropriate. For example, if you do not need to access items by index, but constantly insert items at the beginning of your collection and then remove items from the end, a Queue<T> would be far more appropriate to use.
By using IEnumerable<T> in your API, you provide yourself the flexibility to change the internal implementation at any time without changing any other code. This has huge benefits in terms of allowing your code to be flexible and maintainable.
On this point Jeffrey-Richter writes:
When declaring a method’s parameter types, you should specify the weakest type possible,
preferring interfaces over base classes. For example, if you are writing a method that
manipulates a collection of items, it would be best to declare the method’s parameter by
using an interface such as IEnumerable<T> rather than using a strong data type such as List<T> or even a stronger interface type such as ICollection<T> or IList<T>:
// Desired: This method uses a weak parameter type
public void ManipulateItems<T>(IEnumerable<T> collection) { ... }
// Undesired: This method uses a strong parameter type
public void ManipulateItems<T>(List<T> collection) { ... }
The reason, of course, is that someone can call the first method passing in an array object, a List<T> object, a String object, and so on — any object whose type implements IEnumerable<T>. The second method allows only List<T> objects to be passed in; it will not accept an array or a String object. Obviously, the first method is better because it is much more flexible and can be used in a much wider range of scenarios.
Naturally, if you are writing a method that requires a list (not just any enumerable object), then you should declare the parameter type as an IList<T>. You should still avoid declaring the parameter type as List<T>. Using IList<T> allows the caller to pass arrays and any other objects whose type implements IList<T>.
On the flip side, it is usually best to declare a method’s return type by using the strongest type possible (trying not to commit yourself to a specific type).
If you plan to build a public API, it's better to use IEnumerable than List<T>, because you better use the most minimalistic interface/class. List<T> lets you access objects by index if that's required.
Here is a pretty good guideline when to use IEnumerable, ICollection, List<T> and so on.
Different implementations of collections can be enumerable; using IEnumerable makes it clear that what you're interested in is the enumerability, and not the structure of the underlying implementation of the collection.
As mentioned by Mr. Copsey, this has the benefit of providing decoupling from the implementation, but I'm of the contention that a clear definition of the smallest subset of interface functionality as possible (i.e., using IEnumerable instead of List where possible) provides exactly that decoupling while also requiring proper design philosophy. That is to say, you can achieve decoupling but not achieve minimal dependency, but you cannot achieve minimal dependency without achieving maximal decoupling.
Using the concept of iterators you can achieve major improvement in algorithm quality, both in terms of speed and memory usage.
Let's consider the following two code examples. Both parse the file, one stores lines in collection, the other uses enumerable.
First example is O(N) time, and O(N) memory:
IEnumerable<string> lines = SelectLines();
List<Item> items = lines.Select(l=>ParseToItem(l)).ToList();
var itemOfIterest = items.FirstOrDefault(IsItemOfIterest);
Second example is O(N) time, O(1) memory. Additionally, even if the asymptotic time complexity is still O(N), it would load twice as fewer items as in the first example on average:
var itemOfIterest = lines.FirstOrDefault(l=>IsItemOfIterest(ParseToItem(l));
Here is the code of SelectLines()
IEnumerable<string> SelectLines()
{
...
using(var reader = ...)
while((line=reader.ReadLine())!=null)
yield return line;
}
Here is why it loads twice as fewer items as the first example on average. Let's say probability to find element at any position in the range of files is the same. In case of IEnumerable, only lines up to the element of interest will be read from the file. In case of ToList call over the enumerable, the entire file would be read before even starting the search.
Of course, the List in the first example would hold all the items in memory, that is why O(N) memory usage.
Usually you don't use IEunumerable directly. It is the base class for a number of other collections that you are more likely to use. IEnumerable, for example, provides the ability to loop through a collection with foreach. That is used by many inheriting classes such as List<T>. But IEnumerable doesn't offer a sort method (though you can use Linq for that) while some other generic collections like List<T> do have that method.
Oh sure, you can use it to create your own custom collection types. But for everyday stuff, it probably isn't as useful as the collections derived from it.
IEnumerable gives you the way to implement your own logic of storing and iterating over object collection
Why implement IEnumerable in your class?
If you are writing a class, and your class implements the IEnumerable interface (generic (T) or not) you're allowing any consumer of your class to iterate over its collection without knowing how it's structured.
A LinkedList is implemented differently than a Queue, a Stack, a BinaryTree, a HashTable, Graph, etc. The collection represented by your class might be structured in different ways.
As a "consumer" (if you are writing a class, and your class consumes/makes-use-of an class-object that implements IEnumerable) you can use it without concerning yourself how it's implemented. Sometimes the consumer class doesn't care about the implementation - it just wants to go over all the items (printing them? changing them? comparing them? etc.)
(So as a consumer, if your task is to iterate over all items in a BinaryTree class, and you skipped that lesson in Data-Structures-101 - if the BinaryTree coder implemented the IEnumerable - you're in luck! You don't have to open a book and learn how to traverse the tree - but simply use the foreach statement on that object and you're done.)
As a "producer" (writing a class that holds the data-structure/collection) you might not want consumers of your class to handle how it's structured (afraid they might break it, maybe). So you can make the collection private, and only expose a public IEnumerator.
It can also allow for some uniformity - a collection might have a few ways to iterate over its items (PreOrder, InOrder, PostOrder, Breadth First, Depth First etc.) - but there's only 1 implementation for IEnumerable. You can use this to set the "default" way to iterate over the collection.
Why use IEnumerable in your method?
If I write a method that takes a collection, iterates over it, and takes action on the items (aggregate them? compares them? etc.) why should I restrict myself to 1 type of collections?
Writing this method public void Sum(List<int> list) {...} to sum all items in the collection means I can only receive a list and sum over it. Writing this public void Sum(IEnumerable<int> collection) {...} means I can take any object that implements IEnumerable (Lists, Queues, Stacks, etc.) and sum all of their items as well.
Other Considerations
There's also the issues of deferred execution and unmanaged resources. IEnumerable takes use of the yield syntax, which mean you go over each item by-itself and can perform all kinds of computations before and after. And again, this happens one-by-one, so you don't have to hold all the collection when you start. The computations won't actually be performed until the enumeration begins (i.e. until you run the foreach loop). And that might be useful and more efficient in certain cases. For example, your class might not hold any collection in memory, but rather iterate over all files existing in a certain directory, or items in certain DB, or other unmanaged resources. IEnumerable can step in and do it for you (you could also do it without IEnumerable, but IEnumerable "fits" conceptually, plus it gives you the benefit of being able to use the object produced in a foreach loop).
Implementation of IEnumerable<T> is generally the preferred way for a class to indicate that it should be usable with a "foreach" loop, and that multiple "foreach" loops on the same object should operate independently. Although there are uses of IEnumerable<T> other than "foreach", the normal indication that one should implement IEnumerable is that the class is one where it would make sense to say "foreach foo in classItem {foo.do_something();}.
IEnumerable is taking advantage of deferred execution as explained here: IEnumerable vs List - What to Use? How do they work?
IEnumerable enable implicit reference conversion for array types which known as Covariance. Consider the following example:
public abstract class Vehicle
{
}
public class Car :Vehicle
{
}
private void doSomething1(IEnumerable<Vehicle> vehicles)
{
}
private void doSomething2(List<Vehicle> vehicles)
{
}
var vec = new List<Car>();
doSomething1(vec); // this is ok
doSomething2(vec); // this will give a compilation error

When to use each of T[], List<T>, IEnumerable<T>?

I usually find myself doing something like:
string[] things = arrayReturningMethod();
int index = things.ToList<string>.FindIndex((s) => s.Equals("FOO"));
//do something with index
return things.Distinct(); //which returns an IEnumerable<string>
and I find all this mixup of types/interface a bit confusing and it tickles my potential performance problem antennae (which I ignore until proven right, of course).
Is this idiomatic and proper C# or is there a better alternative to avoid casting back and forth to access the proper methods to work with the data?
EDIT:
The question is actually twofold:
When is it proper to use either the IEnumerable interface or an array or a list (or any other IEnumerable implementing type) directly (when accepting parameters)?
Should you freely move between IEnumerables (implementation unknown) and lists and IEnumerables and arrays and arrays and Lists or is that non idiomatic (there are better ways to do it)/ non performant (not typically relevant, but might be in some cases) / just plain ugly (unmaintable, unreadable)?
In regards to performance...
Converting from List to T[] involves copying all the data from the original list to a newly allocated array.
Converting from T[] to List also involves copying all the data from the original list to a newly allocated List.
Converting from either List or T[] to IEnumerable involves casting, which is a few CPU cycles.
Converting from IEnumerable to List involves upcasting, which is also a few CPU cycles.
Converting from IEnumerable to T[] also involves upcasting.
You can't cast an IEnumerable to T[] or List unless it was a T[] or List respectively to begin with. You can use the ToArray or ToList functions, but those will also result in a copy being made.
Accessing all the values in order from start to end in a T[] will, in a straightforward loop, be optimized to use straightforward pointer arithmetic -- which makes it the fastest of them all.
Accessing all the values in order from start to end in a List involves a check on each iteration to make sure that you aren't accessing a value outside the array's bounds, and then the actual accessing of the array value.
Accessing all the values in an IEnumerable involves creating an enumerator object, calling the Next() function which increases the index pointer, and then calling the Current property which gives you the actual value and sticks it in the variable that you specified in your foreach statement. Generally, this isn't as bad as it sounds.
Accessing an arbitrary value in an IEnumerable involves starting at the beginning and calling Next() as many times as you need to get to that value. Generally, this is as bad as it sounds.
In regards to idioms...
In general, IEnumerable is useful for public properties, function parameters, and often for return values -- and only if you know that you're going to be using the values sequentially.
For instance, if you had a function PrintValues, if it was written as PrintValues(List<T> values), it would only be able to deal with List values, so the user would first have to convert, if for instance they were using a T[]. Likewise with if the function was PrintValues(T[] values). But if it was PrintValues(IEnumerable<T> values), it would be able to deal with Lists, T[]s, stacks, hashtables, dictionaries, strings, sets, etc -- any collection that implements IEnumerable, which is practically every collection.
In regards to internal use...
Use a List only if you're not sure how many items will need to be in it.
Use a T[] if you know how many items will need to be in it, but need to access the values in an arbitrary order.
Stick with the IEnumerable if that's what you've been given and you just need to use it sequentially. Many functions will return IEnumerables. If you do need to access values from an IEnumerable in an arbitrary order, use ToArray().
Also, note that casting is different from using ToArray() or ToList() -- the latter involves copying the values, which is indeed a performance and memory hit if you have a lot of elements. The former simply is to say that "A dog is an animal, so like any animal, it can eat" (downcast) or "This animal happens to be a dog, so it can bark" (upcast). Likewise, All Lists and T[]s are IEnumerables, but only some IEnumerables are Lists or T[]s.
A good rule of thumb is to always use IEnumerable (when declaring your variables/method parameters/method return types/properties/etc.) unless you have a good reason not to. By far the most type-compatible with other (especially extension) methods.
Well, you've got two apples and an orange that you are comparing.
The two apples are the array and the List.
An array in C# is a C-style array that has garbage collection built in. The upside of using them it that they have very little overhead, assuming you don't need to move things around. The bad thing is that they are not as efficient when you are adding things, removing things, and otherwise changing the array around, as memory gets shuffled around.
A List is a C# style dynamic array (similar to the vector<> class in C++). There is more overhead, but they are more efficient when you need to be moving things around a lot, as they will not try to keep the memory usage contiguous.
The best comparison I could give is saying that arrays are to Lists as strings are to StringBuilders.
The orange is 'IEnumerable'. This is not a datatype, but rather it is an interface. When a class implements the IEnumerable interface, it allows that object to be used in a foreach() loop.
When you return the list (as you did in your example), you were not converting the list to an IEnumerable. A list already is an IEnumerable object.
EDIT: When to convert between the two:
It depends on the application. There is very little that can be done with an array that cannot be done with a List, so I would generally recommend the List. Probably the best thing to do is to make a design decision that you are going to use one or the other, that way you don't have to switch between the two. If you rely on an external library, abstract it away to maintain consistent usage.
Hope this clears a little bit of the fog.
Looks to me like the problem is that you haven't bothered learning how to search an array. Hint: Array.IndexOf or Array.BinarySearch depending on whether the array is sorted.
You're right that converting to a list is a bad idea: it wastes space and time and makes the code less readable. Also, blindly upcasting to IEnumerable slows matters down and also completely prevents use of certain algorithms (such as binary search).
I try to avoid rapidly jumping between data types if it can be avoided.
It must be the case that each situation similar to that you described is sufficiently different so as to prevent a dogmatic rule about transforming your types; however, it is generally good practice to select a data structure that provides as best as possible the interface you need without having to copying elements needlessly to new data structures.
When to use what?
I would suggest returning the most specific type, and taking in the most flexible type.
Like this:
public int[] DoSomething(IEnumerable<int> inputs)
{
//...
}
public List<int> DoSomethingElse(IList<int> inputs)
{
//...
}
That way you can call methods on List< T > for whatever you get back from the method in addition to treating it as an IEnumerable. On the inputs, use as flexible as possible, so you don't dictate the users of your method what kind of collection to create.
You're right to ignore the 'performance problem' antennae until you actually have a performance problem. Most performance problems come from doing too much I/O or too much locking or doing one of them wrong, and none of these apply to this question.
My general approach is:
Use T[] for 'static' or 'snapshot'-style information. Use for things where calling .Add() wouldn't make sense anyway, and you don't need the extra methods List<T> gives you.
Accept IEnumerable<T> if you don't really care what you're given and don't need a constant-time .Length/.Count.
Only return IEnumerable<T> when you're doing simple manipulations of an input IEnumerable<T> or when you specifically want to make use of the yield syntax to do your work lazily.
In all other cases, use List<T>. It's just too flexible.
Corollary to #4: don't be afraid of ToList(). ToList() is your friend. It forces the IEnumerable<T> to evaluate right then (useful for when you're stacking several where clauses). Don't go nuts with it, but feel free to call it once you've built up your full where clause before you do the foreach over it (or the like).
Of course, this is just a rough guideline. Just please try to follow the same pattern in the same codebase -- code styles that jump around make it harder for maintenance coders to get into your frame of mind.

Variable number of arguments without boxing the value-types?

public void DoSomething(params object[] args)
{
// ...
}
The problem with the above signature is that every value-type that will be passed to that method will be boxed implicitly, and this is serious performance issue for me.
Is there a way to declear a method that accepts variable number of arguments without boxing the value-types?
Thanks.
You can use generics:
public void DoSomething<T>(params T[] args)
{
}
However, this will only allow a single type of ValueType to be specified. If you need to mix or match value types, you'll have to allow boxing to occur, as you're doing now, or provide specific overloads for different numbers of parameters.
Edit: If you need more than one type of parameter, you can use overloads to accomplish this, to some degree.
public void DoSomething<T,U>(T arg1, params U[] args) {}
public void DoSomething<T,U>(T arg1, T arg2, params U[] args) {}
Unfortunately, this requires multiple overloads to exist for your types.
Alternatively, you could pass in arrays directly:
public void DoSomething<T,U>(T[] args1, U[] args2) {}
You lose the nice compiler syntax, but then you can have any number of both parameters passed.
Not presently, no, and I haven't seen anything addressing the issue in the .NET 4 info that's been released.
If it's a huge performance problem for you, you might consider several overloads of commonly seen parameter lists.
I wonder, though: is it really a performance problem, or are you prematurely optimizing?
Let's assume the code you're calling this method from is aware of argument types. If so, you can pack them into appropriate Tuple type from .NET 4, and pass its instance (Tuple is reference type) to such method as object (since there is no common base for all the Tuples).
The main problem here is that it isn't easy to process the arguments inside this method without boxing / unboxing, and likely, even without reflection. Try to think what must be done to extract, let's say, Nth argument without boxing. You'll end up with understanding you must either deal with dictionary lookup(s) there (involving either regular Dictionary<K,V> or internal dictionaries used by CLR), or with boxing. Obviously, dictionary lookups are much more costly.
I'm writing this because actually we developed a solution for very similar problem: we must be able to operate with our own Tuples without boxing - mainly, to compare and deserialize them (Tuples are used by database engine we develop, so performance of any basic operation is really essential in our case).
But:
We end up with pretty complex solution. Take a look e.g. at TupleComparer.
Effect of absence of boxing is actually not as good as we expected: each boxing / unboxing operation is replaced by a single array indexing and few virtual method calls, the cost of both ways is almost identical.
The only benefit of approach we developed is that we don't "flood" Gen0 by garbage, so Gen0 collections happen much more rarely. Since Gen0 collection cost is proportional to the space allocated by "live" objects and to their count, this brings noticeable advantage, if other allocations intermix with (or simply happen during) the execution of algorithm we try to optimize by this way.
Results: after this optimization our synthetic tests were showing from 0% to 200-300% performance increase; on the other hand, simple performance test of the database engine itself have shown much less impressive improvement (about 5-10%). A lot of time were wasted at above layers (there is a pretty complex ORM as well), but... Most likely that's what you'll really see after implementing similar stuff.
In short, I advise you to focus on something else. If it will be fully clear this is a major performance problem in your application, and there are no other good ways of resolving it, well, go ahead... Otherwise you're simply steeling from your customer or your own by doing premature optimization.
For a completely generic implementation, the common workaround is to use a fluent pattern. Something like this:
public class ClassThatDoes
{
public ClassThatDoes DoSomething<T>(T arg) where T : struct
{
// process
return this;
}
}
Now you call:
classThatDoes.DoSomething(1).DoSomething(1m).DoSomething(DateTime.Now)//and so on
However that doesn't work with static classes (extension methods are ok since you can return this).
Your question is basically the same as this: Can I have a variable number of generic parameters? asked in a different way.
Or accept an array of items with params keyword:
public ClassThatDoes DoSomething<T>(params T[] arg) where T : struct
{
// process
return this;
}
and call:
classThatDoes.DoSomething(1, 2, 3)
.DoSomething(1m, 2m, 3m)
.DoSomething(DateTime.Now) //etc
Whether the array creating overhead is less than boxing overhead is something you will have to decide yourself.
In C# 4.0 you can use named (and thus optional) parameters! More info on this blog post

Which do you prefer for interfaces: T[], IEnumerable<T>, IList<T>, or other?

Ok, I'm hoping the community at large will aid us in solving a workplace debate that has been ongoing for a while. This has to do with defining interfaces that either accept or return lists of some type. There are several ways of doing this:
public interface Foo
{
Bar[] Bars { get; }
IEnumerable<Bar> Bars { get; }
ICollection<Bar> Bars { get; }
IList<Bar> Bars { get; }
}
My own preference is to use IEnumerable for arguments and arrays for return values:
public interface Foo
{
void Do(IEnumerable<Bar> bars);
Bar[] Bars { get; }
}
My argument for this approach is that the implementation class can create a List directly from the IEnumerable and simply return it with List.ToArray(). However some believe that IList should be returned instead of an array. The problem I have here is that now your required again to copy it with a ReadOnlyCollection before returning. The option of returning IEnumerable seems troublesome for client code?
What do you use/prefer? (especially with regards to libraries that will be used by other developers outside your organization)
My preference is IEnumerable<T>. Any other of the suggested interfaces gives the appearance of allowing the consumer to modify the underlying collection. This is almost certainly not what you want to do as it's allowing consumers to silently modify an internal collection.
Another good one IMHO, is ReadOnlyCollection<T>. It allows for all of the fun .Count and Indexer properties and unambiguously says to the consumer "you cannot modify my data".
I don't return arrays - they really are a terrible return type to use when creating an API - if you truly need a mutable sequence use the IList<T> or ICollection<T> interface or return a concrete Collection<T> instead.
Also I would suggest that you read Arrays considered somewhat harmful by Eric Lippert:
I got a moral question from an author
of programming language textbooks the
other day requesting my opinions on
whether or not beginner programmers
should be taught how to use arrays.
Rather than actually answer that
question, I gave him a long list of my
opinions about arrays, how I use
arrays, how we expect arrays to be
used in the future, and so on. This
gets a bit long, but like Pascal, I
didn't have time to make it shorter.
Let me start by saying when you
definitely should not use arrays, and
then wax more philosophical about the
future of modern programming and the
role of the array in the coming world.
For property collections that are indexed (and the indices have necessary semantic meaning), you should use ReadOnlyCollection<T> (read only) or IList<T> (read/write). It's the most flexible and expressive. For non-indexed collections, use IEnumerable<T> (read only) or ICollection<T> (read/write).
Method parameters should use IEnumerable<T> unless they 1) need to add/remove items to the collection (ICollection<T>) or 2) require indexes for necesary semantic purposes (IList<T>). If the method can benefit from indexing availability (such as a sorting routine), it can always use as IList<T> or .ToList() when that fails in the implementation.
I think about this in terms of writing the most useful code possible: code that can do more.
Put in those terms, it means I like to accept the weakest interface possible as method arguments, because that makes my code useful from more places. In this case, that's an IEnumerable<T>. Have an array? You can call my method. Have a List? You can call my method. Have an iterator block? You get the idea.
It also means I like my methods to return the strongest interface that is convenient, so that code that relies on the method can easily do more. In this case, that would be IList<T>. Note that this doesn't mean I will construct a list just so I can return it. It just means that if I already have some that implements IList<T>, I may as well use it.
Note that I'm a little unconventional with regards to return types. A more typical approach is to also return weaker types from methods to avoid locking yourself into a specific implementation.
I would prefer IEnumerable as it is the most highlevel of the interfaces giving the end user the opportunity to re-cast as he wishes. Even though this may provide the user with minimum functionality to begin with (basically only enumeration) it would still be enough to cover virtually any need, especially with the extension methods, ToArray(), ToList() etc.
IEnumerable<T> is very useful for lazy-evaluated iteration, especially in scenarios that use method chaining.
But as a return type for a typical data access tier, a Count property is often useful, and I would prefer to return an ICollection<T> with a Count property or possibly IList<T> if I think typical consumers will want to use an indexer.
This is also an indication to the caller that the collection has actually been materialized. And thus the caller can iterate through the returned collection without getting exceptions from the data access tier. This can be important. For example, a service may generate a stream (e.g. SOAP) from the returned collection. It can be awkward if an exception is thrown from the data access layer while generating the stream due to lazy-evaluated iteration, as the output stream is already partially written when the exception is thrown.
Since the Linq extension methods were added to IEnumerable<T>, I've found that my use of the other interfaces has declined considerably; probably around 80%. I used to use List<T> religiously as it had methods that accepted delegates for lazy evaluation like Find, FindAll, ForEach and the like. Since that's available through System.Linq's extensions, I've replaced all those references with IEnumerable<T> references.
I wouldn't go with array, its a type that allows modification yet doesn't have add/remove... kind of like the worst of the pack. If I want to allow modifications, then I would use a type that supports add/remove.
When you want to prevent modifications, you are already wrapping it/copying it, so I don't see what's wrong with a an IEnumerable or a ReadOnlyCollection. I would go with the later ... something I don't like about IEnumerable is that its lazy by nature, yet when you are using with pre-loaded data only to wrap it, calling code that works with it tends to assume pre-loaded data or have extra "unnecessary" lines :( ... that can get ugly results during change.

Categories

Resources