Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Every time I create an object that has a collection property I go back and forth on the best way to do it?
public property with a getter that
returns a reference to private variable
explicit get_ObjList and set_ObjList
methods that return and create new or cloned
objects every time
explicit get_ObjList that returns an
IEnumerator and a set_ObjList that
takes IEnumerator
Does it make a difference if the collection is an array (i.e., objList.Clone()) versus a List?
If returning the actual collection as a reference is so bad because it creates dependencies, then why return any property as a reference? Anytime you expose an child object as a reference the internals of that child can be changed without the parent "knowing" unless the child has a property changed event. Is there a risk for memory leaks?
And, don't options 2 and 3 break serialization? Is this a catch 22 or do you have to implement custom serialization anytime you have a collection property?
The generic ReadOnlyCollection seems like a nice compromise for general use. It wraps an IList and restricts access to it. Maybe this helps with memory leaks and serialization. However it still has enumeration concerns
Maybe it just depends. If you don't care that the collection is modified, then just expose it as a public accessor over a private variable per #1. If you don't want other programs to modify the collection then #2 and/or #3 is better.
Implicit in the question is why should one method be used over another and what are the ramifications on security, memory, serialization, etc.?
How you expose a collection depends entirely on how users are intended to interact with it.
1) If users will be adding and removing items from an object's collection, then a simple get-only collection property is best (option #1 from the original question):
private readonly Collection<T> myCollection_ = new ...;
public Collection<T> MyCollection {
get { return this.myCollection_; }
}
This strategy is used for the Items collections on the WindowsForms and WPF ItemsControl controls, where users add and remove items they want the control to display. These controls publish the actual collection and use callbacks or event listeners to keep track of items.
WPF also exposes some settable collections to allow users to display a collection of items they control, such as the ItemsSource property on ItemsControl (option #3 from the original question). However, this is not a common use case.
2) If users will only be reading data maintained by the object, then you can use a readonly collection, as Quibblesome suggested:
private readonly List<T> myPrivateCollection_ = new ...;
private ReadOnlyCollection<T> myPrivateCollectionView_;
public ReadOnlyCollection<T> MyCollection {
get {
if( this.myPrivateCollectionView_ == null ) { /* lazily initialize view */ }
return this.myPrivateCollectionView_;
}
}
Note that ReadOnlyCollection<T> provides a live view of the underlying collection, so you only need to create the view once.
If the internal collection does not implement IList<T>, or if you want to restrict access to more advanced users, you can instead wrap access to the collection through an enumerator:
public IEnumerable<T> MyCollection {
get {
foreach( T item in this.myPrivateCollection_ )
yield return item;
}
}
This approach is simple to implement and also provides access to all the members without exposing the internal collection. However, it does require that the collection remain unmodfied, as the BCL collection classes will throw an exception if you try to enumerate a collection after it has been modified. If the underlying collection is likely to change, you can either create a light wrapper that will enumerate the collection safely, or return a copy of the collection.
3) Finally, if you need to expose arrays rather than higher-level collections, then you should return a copy of the array to prevent users from modifying it (option #2 from the orginal question):
private T[] myArray_;
public T[] GetMyArray( ) {
T[] copy = new T[this.myArray_.Length];
this.myArray_.CopyTo( copy, 0 );
return copy;
// Note: if you are using LINQ, calling the 'ToArray( )'
// extension method will create a copy for you.
}
You should not expose the underlying array through a property, as you will not be able to tell when users modify it. To allow modifying the array, you can either add a corresponding SetMyArray( T[] array ) method, or use a custom indexer:
public T this[int index] {
get { return this.myArray_[index]; }
set {
// TODO: validate new value; raise change event; etc.
this.myArray_[index] = value;
}
}
(of course, by implementing a custom indexer, you will be duplicating the work of the BCL classes :)
I usually go for this, a public getter that returns System.Collections.ObjectModel.ReadOnlyCollection:
public ReadOnlyCollection<SomeClass> Collection
{
get
{
return new ReadOnlyCollection<SomeClass>(myList);
}
}
And public methods on the object to modify the collection.
Clear();
Add(SomeClass class);
If the class is supposed to be a repository for other people to mess with then I just expose the private variable as per method #1 as it saves writing your own API, but I tend to shy away from that in production code.
ReadOnlyCollection still has the disadvantage that the consumer can't be sure that the original collection won't be changed at an inopportune time. Instead you can use Immutable Collections. If you need to do a change then instead changing the original you are being given a modified copy. The way it is implemented it is competitive with the performance of the mutable collections. Or even better if you don't have to copy the original several times to make a number of different (incompatible) changes afterwards to each copy.
I recommend to use the new IReadOnlyList<T> and IReadOnlyCollection<T> Interfaces to expose a collection (requires .NET 4.5).
Example:
public class AddressBook
{
private readonly List<Contact> contacts;
public AddressBook()
{
this.contacts = new List<Contact>();
}
public IReadOnlyList<Contact> Contacts { get { return contacts; } }
public void AddContact(Contact contact)
{
contacts.Add(contact);
}
public void RemoveContact(Contact contact)
{
contacts.Remove(contact);
}
}
If you need to guarantee that the collection can not be manipulated from outside then consider ReadOnlyCollection<T> or the new Immutable collections.
Avoid using the interface IEnumerable<T> to expose a collection.
This interface does not define any guarantee that multiple enumerations perform well. If the IEnumerable represents a query then every enumeration execute the query again. Developers that get an instance of IEnumerable do not know if it represents a collection or a query.
More about this topic can be read on this Wiki page.
If you're simply looking to expose a collection on your instance, then using a getter/setter to a private member variable seems like the most sensible solution to me (your first proposed option).
Why do you suggest using ReadOnlyCollection(T) is a compromise? If you still need to get change notifications made on the original wrapped IList you could also use a ReadOnlyObservableCollection(T) to wrap your collection. Would this be less of a compromise in your scenario?
I'm a java developer but I think this is the same for c#.
I never expose a private collection property because other parts of the program can change it without parent noticing, so that in the getter method I return an array with the objects of the collection and in the setter method I call a clearAll() over the collection and then an addAll()
Related
I have spent quite a few hours pondering the subject of exposing list members. In a similar question to mine, Jon Skeet gave an excellent answer. Please feel free to have a look.
ReadOnlyCollection or IEnumerable for exposing member collections?
I am usually quite paranoid to exposing lists, especially if you are developing an API.
I have always used IEnumerable for exposing lists, as it is quite safe, and it gives that much flexibility. Let me use an example here:
public class Activity
{
private readonly IList<WorkItem> workItems = new List<WorkItem>();
public string Name { get; set; }
public IEnumerable<WorkItem> WorkItems
{
get
{
return this.workItems;
}
}
public void AddWorkItem(WorkItem workItem)
{
this.workItems.Add(workItem);
}
}
Anyone who codes against an IEnumerable is quite safe here. If I later decide to use an ordered list or something, none of their code breaks and it is still nice. The downside of this is IEnumerable can be cast back to a list outside of this class.
For this reason, a lot of developers use ReadOnlyCollection for exposing a member. This is quite safe since it can never be cast back to a list. For me I prefer IEnumerable since it provides more flexibility, should I ever want to implement something different than a list.
I have come up with a new idea I like better. Using IReadOnlyCollection:
public class Activity
{
private readonly IList<WorkItem> workItems = new List<WorkItem>();
public string Name { get; set; }
public IReadOnlyCollection<WorkItem> WorkItems
{
get
{
return new ReadOnlyCollection<WorkItem>(this.workItems);
}
}
public void AddWorkItem(WorkItem workItem)
{
this.workItems.Add(workItem);
}
}
I feel this retains some of the flexibility of IEnumerable and is encapsulated quite nicely.
I posted this question to get some input on my idea. Do you prefer this solution to IEnumerable? Do you think it is better to use a concrete return value of ReadOnlyCollection? This is quite a debate and I want to try and see what are the advantages/disadvantages that we all can come up with.
EDIT
First of all thank you all for contributing so much to the discussion here. I have certainly learned a ton from each and every one and would like to thank you sincerely.
I am adding some extra scenarios and info.
There are some common pitfalls with IReadOnlyCollection and IEnumerable.
Consider the example below:
public IReadOnlyCollection<WorkItem> WorkItems
{
get
{
return this.workItems;
}
}
The above example can be casted back to a list and mutated, even though the interface is readonly. The interface, despite it's namesake does not guarantee immutability. It is up to you to provide an immutable solution, therefore you should return a new ReadOnlyCollection. By creating a new list (a copy essentially), the state of your object is safe and sound.
Richiban says it best in his comment: a interface only guarantees what something can do, not what it cannot do.
See below for an example:
public IEnumerable<WorkItem> WorkItems
{
get
{
return new List<WorkItem>(this.workItems);
}
}
The above can be casted and mutated, but your object is still immutable.
Another outside the box statement would be collection classes. Consider the following:
public class Bar : IEnumerable<string>
{
private List<string> foo;
public Bar()
{
this.foo = new List<string> { "123", "456" };
}
public IEnumerator<string> GetEnumerator()
{
return this.foo.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
The class above can have methods for mutating foo the way you want it to be, but your object can never be casted to a list of any sort and mutated.
Carsten Führmann makes a fantastic point about yield return statements in IEnumerables.
One important aspect seems to be missing from the answers so far:
When an IEnumerable<T> is returned to the caller, they must consider the possibility that the returned object is a "lazy stream", e.g. a collection built with "yield return". That is, the performance penalty for producing the elements of the IEnumerable<T> may have to be paid by the caller, for each use of the IEnumerable. (The productivity tool "Resharper" actually points this out as a code smell.)
By contrast, an IReadOnlyCollection<T> signals to the caller that there will be no lazy evaluation. (The Count property, as opposed to the Count extension method of IEnumerable<T> (which is inherited by IReadOnlyCollection<T> so it has the method as well), signals non-lazyness. And so does the fact that there seem to be no lazy implementations of IReadOnlyCollection.)
This is also valid for input parameters, as requesting an IReadOnlyCollection<T> instead of IEnumerable<T> signals that the method needs to iterate several times over the collection. Sure the method could create its own list from the IEnumerable<T> and iterate over that, but as the caller may already have a loaded collection at hand it would make sense to take advantage of it whenever possible. If the caller only has an IEnumerable<T> at hand, he only needs to add .ToArray() or .ToList() to the parameter.
What IReadOnlyCollection does not do is prevent the caller to cast to some other collection type. For such protection, one would have to use the class ReadOnlyCollection<T>.
In summary, the only thing IReadOnlyCollection<T> does relative to IEnumerable<T> is add a Count property and thus signal that no lazyness is involved.
Talking about class libraries, I think IReadOnly* is really useful, and I think you're doing it right :)
It's all about immutable collection... Before there were just immutables and to enlarge arrays was a huge task, so .net decided to include in the framework something different, mutable collection, that implement the ugly stuff for you, but IMHO they didn't give you a proper direction for immutable that are extremely useful, especially in a high concurrency scenario where sharing mutable stuff is always a PITA.
If you check other today languages, such as objective-c, you will see that in fact the rules are completely inverted! They quite always exchange immutable collection between different classes, in other words the interface expose just immutable, and internally they use mutable collection (yes, they have it of course), instead they expose proper methods if they want let the outsiders change the collection (if the class is a stateful class).
So this little experience that I've got with other languages pushes me to think that .net list are so powerful, but the immutable collection were there for some reason :)
In this case is not a matter of helping the caller of an interface, to avoid him to change all the code if you're changing internal implementation, like it is with IList vs List, but with IReadOnly* you're protecting yourself, your class, to being used in not a proper way, to avoid useless protection code, code that sometimes you couldn't also write (in the past in some piece of code I had to return a clone of the complete list to avoid this problem).
My take on concerns of casting and IReadOnly* contracts, and 'proper' usages of such.
If some code is being “clever” enough to perform an explicit cast and break the interface contract, then it is also “clever” enough to use reflection or otherwise do nefarious things such as access the underlying List of a ReadOnlyCollection wrapper object. I don’t program against such “clever” programmers.
The only thing that I guarantee is that after said IReadOnly*-interface objects are exposed, then my code will not violate that contract and will not modified the returned collection object.
This means that I write code that returns List-as-IReadOnly*, eg., and rarely opt for an actual read-only concrete type or wrapper. Using IEnumerable.ToList is sufficient to return an IReadOnly[List|Collection] - calling List.AsReadOnly adds little value against “clever” programmers who can still access the underlying list that the ReadOnlyCollection wraps.
In all cases, I guarantee that the concrete types of IReadOnly* return values are eager. If I ever write a method that returns an IEnumerable, it is specifically because the contract of the method is that which “supports streaming” fsvo.
As far as IReadOnlyList and IReadOnlyCollection, I use the former when there is 'an' implied stable ordering established that is meaningful to index, regardless of purposeful sorting. For example, arrays and Lists can be returned as an IReadOnlyList while a HashSet would better be returned as an IReadOnlyCollection. The caller can always assign the I[ReadOnly]List to an I[ReadOnly]Collection as desired: this choice is about the contract exposed and not what a programmer, “clever” or otherwise, will do.
It seems that you can just return an appropriate interface:
...
private readonly List<WorkItem> workItems = new List<WorkItem>();
// Usually, there's no need the property to be virtual
public virtual IReadOnlyList<WorkItem> WorkItems {
get {
return workItems;
}
}
...
Since workItems field is in fact List<T> so the natural idea IMHO is to expose the most wide interface which is IReadOnlyList<T> in the case
!! IEnumerable vs IReadOnlyList !!
IEnumerable has been with us from the beginning of time. For many years, it was a de facto standard way to represent a read-only collection. Since .NET 4.5, however, there is another way to do that: IReadOnlyList.
Both collection interfaces are useful.
<>
Often you have to implement a collection because it is not present among those of the .NET Framework. In the examples that I find online, often the new collection is built based on another collection (for example, List<T>): in this way it is possible to avoid the management of the resizing of the collection.
public class CustomCollection<T>
{
private List<T> _baseArray;
...
public CustomCollection(...)
{
this._baseArray = new List<T>(...);
}
}
What are the disadvantages of following this approach? Only lower performance because of the method calls to the base collection? Or the compiler performs some optimization?
Moreover, in some cases the field relating to the base collection (for example the above _baseArray) is declared as readonly. Why?
The main disadvantage is the fact that if you want to play nice you'll have to implement a lot of interfaces by hand (ICollection, IEnumerable, possibly IList... both generic and non-generic), and that's quite a bit of code. Not complex code, since you're just relaying the calls, but still code. The extra call to the inner list shouldn't make too big of a difference in most cases.
It's to enforce the fact that once the inner list is set, it cannot be changed into another list.
Usually it's best to inherit from one of the many built-in collection classes to make your own collection, instead of doing it the hard way. Collection<T> is a good starting point, and nobody is stopping you from inheriting List<T> itself.
For #2: if the private member is only assigned to in the constructor or when declared, it can be readonly. This is usually true if you only have one underlying collection and don't ever need to recreate it.
I'd say a pretty large disadvantage of this approach is that you can't use LINQ on your custom collection unless you implement IEnumerable. A better approach might be to subclass and force new implementation on methods as necessary, ex:
public class FooList<T> : List<T>
{
public new void Add(T item)
{
// any FooList-specific logic regarding adding items
base.Add(item);
}
}
As for the readonly keyword, it means that you can only set the variable in the constructor.
I have a class called GestorePersonale which holds a list of instances of another class:
public List<Dipendente> Dipendenti
{
get;
private set;
}
I want to keep this list modifiable only from the methods the class exposes, and not directly. I noticed that with the code above, one could just do var gp = new GestorePersonale();
gp.Dipendenti.Add( new Dipendente( ... ) );
and be able to perform any other kind of action on the List<Dipendente> itself.
I considered converting the first code snippet to
private List dipendenti;
but I could find a few downsides to that:
This would break the personal rule of mine to try to always use the public fields over the private ones from inside the class's methods whenever possible (even though I'm not sure if it is good practice to do so, so any clarification would be welcome);This would impair any external entities' ability to access the contents of the list for reading purposes only, like, say, to execute a LINQ query over the contents of the list.
What would be the best way to solve this situation?
You can wrap the list in a ReadOnlyCollection<T> and expose that:
private List<Dipendente> dipendenti;
private ReadOnlyCollection<Dipendente> readOnlyDipendenti;
public GestorePersonale()
{
dipendenti = new List<Dipendente>();
readOnlyDipendenti = new ReadOnlyCollection<Dipendente>(dipendenti);
}
public ReadOnlyCollection<Dipendente> Dipendenti
{
get { return readOnlyDipendenti; }
}
Internally, you have access to dipendenti and can add/remove items. External entities have access only to the ReadOnlyCollection<T> that wraps the list, so they can only read, but not add/remove items.
I would agree with dtb that ReadOnlyCollections is the way to go. However, you can return it from the property getter (using AsReadOnly) and drop the method.
private List<Dipendente> dipendenti = new List<Dipendente>();
public ReadOnlyCollection<Dipendente> ReadOnlyDipendenti
{
get
{
return dipendenti.AsReadOnly();
}
}
there are a couple of things you can do:
you use ReadOnlyCollection
you can return an IEnumerable<_type>
you can wrap the list in another class
you can roll your own collection class, implementing the appropriate interface
the method you use depends on the functionality you need and what you want/need to expose to the user of your class
What you have is a public property with a private accessor. It is very useful. It allows an instance to expose a value that is controlled (set) by the instance itself, e.g. a state.
For example, take a collection with a Count property. It makes no sense for it have a public accessor. An implementation could be to update the property (internally) when the collection is changed (to avoid having to count it each time).
Do a setter method or wrap the field in another class. This is a classic collection set and collection.add problem.
I know there has been a lot of posts on this but it still confuses me why should you pass in an interface like IList and return an interface like IList back instead of the concrete list.
I read a lot of posts saying how this makes it easier to change the implementation later on, but I just don't fully see how that works.
Say if I have this method
public class SomeClass
{
public bool IsChecked { get; set; }
}
public void LogAllChecked(IList<SomeClass> someClasses)
{
foreach (var s in someClasses)
{
if (s.IsChecked)
{
// log
}
}
}
I am not sure how using IList will help me out in the future.
How about if I am already in the method? Should I still be using IList?
public void LogAllChecked(IList<SomeClass> someClasses)
{
//why not List<string> myStrings = new List<string>()
IList<string> myStrings = new List<string>();
foreach (var s in someClasses)
{
if (s.IsChecked)
{
myStrings.Add(s.IsChecked.ToString());
}
}
}
What do I get for using IList now?
public IList<int> onlySomeInts(IList<int> myInts)
{
IList<int> store = new List<int>();
foreach (var i in myInts)
{
if (i % 2 == 0)
{
store.Add(i);
}
}
return store;
}
How about now? Is there some new implementation of a list of int's that I will need to change out?
Basically, I need to see some actual code examples of how using IList would have solved some problem over just taking List into everything.
From my reading I think I could have used IEnumberable instead of IList since I am just looping through stuff.
Edit
So I have been playing around with some of my methods on how to do this. I am still not sure about the return type(if I should make it more concrete or an interface).
public class CardFrmVm
{
public IList<TravelFeaturesVm> TravelFeaturesVm { get; set; }
public IList<WarrantyFeaturesVm> WarrantyFeaturesVm { get; set; }
public CardFrmVm()
{
WarrantyFeaturesVm = new List<WarrantyFeaturesVm>();
TravelFeaturesVm = new List<TravelFeaturesVm>();
}
}
public class WarrantyFeaturesVm : AvailableFeatureVm
{
}
public class TravelFeaturesVm : AvailableFeatureVm
{
}
public class AvailableFeatureVm
{
public Guid FeatureId { get; set; }
public bool HasFeature { get; set; }
public string Name { get; set; }
}
private IList<AvailableFeature> FillAvailableFeatures(IEnumerable<AvailableFeatureVm> avaliableFeaturesVm)
{
List<AvailableFeature> availableFeatures = new List<AvailableFeature>();
foreach (var f in avaliableFeaturesVm)
{
if (f.HasFeature)
{
// nhibernate call to Load<>()
AvailableFeature availableFeature = featureService.LoadAvaliableFeatureById(f.FeatureId);
availableFeatures.Add(availableFeature);
}
}
return availableFeatures;
}
Now I am returning IList for the simple fact that I will then add this to my domain model what has a property like this:
public virtual IList<AvailableFeature> AvailableFeatures { get; set; }
The above is an IList itself as this is what seems to be the standard to use with nhibernate. Otherwise I might have returned IEnumberable back but not sure. Still, I can't figure out what the user would 100% need(that's where returning a concrete has an advantage over).
Edit 2
I was also thinking what happens if I want to do pass by reference in my method?
private void FillAvailableFeatures(IEnumerable<AvailableFeatureVm> avaliableFeaturesVm, IList<AvailableFeature> toFill)
{
foreach (var f in avaliableFeaturesVm)
{
if (f.HasFeature)
{
// nhibernate call to Load<>()
AvailableFeature availableFeature = featureService.LoadAvaliableFeatureById(f.FeatureId);
toFill.Add(availableFeature);
}
}
}
would I run into problems with this? Since could they not pass in an array(that has a fixed size)? Would it be better maybe for a concrete List?
There are three questions here: what type should I use for a formal parameter? What should I use for a local variable? and what should I use for a return type?
Formal parameters:
The principle here is do not ask for more than you need. IEnumerable<T> communicates "I need to get the elements of this sequence from beginning to end". IList<T> communicates "I need to get and set the elements of this sequence in arbitrary order". List<T> communicates "I need to get and set the elements of this sequence in arbitrary order and I only accept lists; I do not accept arrays."
By asking for more than you need, you (1) make the caller do unnecessary work to satisfy your unnecessary demands, and (2) communicate falsehoods to the reader. Ask only for what you're going to use. That way if the caller has a sequence, they don't need to call ToList on it to satisfy your demand.
Local variables:
Use whatever you want. It's your method. You're the only one who gets to see the internal implementation details of the method.
Return type:
Same principle as before, reversed. Offer the bare minimum that your caller requires. If the caller only requires the ability to enumerate the sequence, only give them an IEnumerable<T>.
The most practical reason I've ever seen was given by Jeffrey Richter in CLR via C#.
The pattern is to take the basest class or interface possible for your arguments and return the most specific class or interface possible for your return types. This gives your callers the most flexibility in passing in types to your methods and the most opportunities to cast/reuse the return values.
For example, the following method
public void PrintTypes(IEnumerable items)
{
foreach(var item in items)
Console.WriteLine(item.GetType().FullName);
}
allows the method to be called passing in any type that can be cast to an enumerable. If you were more specific
public void PrintTypes(List items)
then, say, if you had an array and wished to print their type names to the console, you would first have to create a new List and fill it with your types. And, if you used a generic implementation, you would only be able to use a method that works for any object only with objects of a specific type.
When talking about return types, the more specific you are, the more flexible callers can be with it.
public List<string> GetNames()
you can use this return type to iterate the names
foreach(var name in GetNames())
or you can index directly into the collection
Console.WriteLine(GetNames()[0])
Whereas, if you were getting back a less specific type
public IEnumerable GetNames()
you would have to massage the return type to get the first value
Console.WriteLine(GetNames().OfType<string>().First());
IEnumerable<T> allows you to iterate through a collection. ICollection<T> builds on this and also allows for adding and removing items. IList<T> also allows for accessing and modifying them at a specific index. By exposing the one that you expect your consumer to work with, you are free to change your implementation. List<T> happens to implement all three of those interfaces.
If you expose your property as a List<T> or even an IList<T> when all you want your consumer to have is the ability to iterate through the collection. Then they could come to depend on the fact that they can modify the list. Then later if you decide to convert the actual data store from a List<T> to a Dictionary<T,U> and expose the dictionary keys as the actual value for the property (I have had to do exactly this before). Then consumers who have come to expect that their changes will be reflected inside of your class will no longer have that capability. That's a big problem! If you expose the List<T> as an IEnumerable<T> you can comfortably predict that your collection is not being modified externally. That is one of the powers of exposing List<T> as any of the above interfaces.
This level of abstraction goes the other direction when it belongs to method parameters. When you pass your list to a method that accepts IEnumerable<T> you can be sure that your list is not going to be modified. When you are the person implementing the method and you say you accept an IEnumerable<T> because all you need to do is iterate through that list. Then the person calling the method is free to call it with any data type that is enumerable. This allows your code to be used in unexpected, but perfectly valid ways.
From this it follows that your method implementation can represent its local variables however you wish. The implementation details are not exposed. Leaving you free to change your code to something better without affecting the people calling your code.
You cannot predict the future. Assuming that a property's type will always be beneficial as a List<T> is immediately limiting your ability to adapt to unforeseen expectations of your code. Yes, you may never change that data type from a List<T> but you can be sure that if you have to. Your code is ready for it.
Short Answer:
You pass the interface so that no matter what concrete implementation of that interface you use, your code will support it.
If you use a concrete implementation of list, another implementation of the same list will not be supported by your code.
Read a bit on inheritance and polymorphism.
Here's an example: I had a project once where our lists got very large, and resulting fragmentation of the large object heap was hurting performance. We replaced List with LinkedList. LinkedList does not contain an array, so all of a sudden, we had almost no use of the large object heap.
Mostly, we used the lists as IEnumerable<T>, anyway, so there was no further change needed. (And yes, I would recommend declaring references as IEnumerable if all you're doing is enumerating them.) In a couple of places, we needed the list indexer, so we wrote an inefficient IList<T> wrapper around the linked lists. We needed the list indexer infrequently, so the inefficiency was not a problem. If it had been, we could have provided some other implementation of IList, perhaps as a collection of small-enough arrays, that would have been more efficiently indexable while also avoiding large objects.
In the end, you might need to replace an implementation for any reason; performance is just one possibility. Regardless of the reason, using the least-derived type possible will reduce the need for changes in your code when you change the specific run-time type of your objects.
Inside the method, you should use var, instead of IList or List. When your data source changes to come from a method instead, your onlySomeInts method will survive.
The reason to use IList instead of List as parameters, is because many things implement IList (List and [], as two examples), but only one thing implements List. It's more flexible to code to the interface.
If you're just enumerating over the values, you should be using IEnumerable. Every type of datatype that can hold more than one value implements IEnumerable (or should) and makes your method hugely flexible.
Using IList instead of List makes writing unit tests significantly easier. It allows you to use a 'Mocking' library to pass and return data.
The other general reason for using interfaces is to expose the minimum amount of knowledge necessary to the user of an object.
Consider the (contrived) case where I have a data object that implements IList.
public class MyDataObject : IList<int>
{
public void Method1()
{
...
}
// etc
}
Your functions above only care about being able to iterate over a list. Ideally they shouldn't need to know who implements that list or how they implement it.
In your example, IEnumerable is a better choice as you thought.
It is always a good idea to reduce the dependencies between your code as much as possible.
Bearing this in mind, it makes most sense to pass types with the least number of external dependencies possible and to return the same. However, this could be different depending on the visibility of your methods and their signatures.
If your methods form part of an interface, the methods will need to be defined using types available to that interface. Concrete types will probably not be available to interfaces, so they would have to return non-concrete types. You would want to do this if you were creating a framework, for example.
However, if you are not writing a framework, it may be advantageous to pass parameter with the weakest possible types (i.e. base classes, interfaces, or even delegates) and return concrete types. That gives the caller the ability to do as much as possible with the returned object, even if it is cast as an interface. However, this makes the method more fragile, as any change to the returned object type may break the calling code. In practice though, that generally isn't a major problem.
You accept an Interface as a parameter for a method because that allows the caller to submit different concrete types as arguments. Given your example method LogAllChecked, the parameter someClasses could be of various types, and for the person writing the method, all might be equivalent (i.e. you'd write the exact same code regardless of the type of the parameter). But for the person calling the method, it can make a huge difference -- if they have an array and you're asking for a list, they have to change the array to a list or v.v. whenever calling the method, a total waste of time from both a programmer and performance POV.
Whether you return an Interface or a concrete type depends upon what you want to let your callers do with the object you created -- this is an API design decision, and there's no hard and fast rule. You have to weigh their ability to make full use of the object against their ability to easily use a portion of the objects functionality (and of course whether you WANT them to be making full use of the object). For instance, if you return an IEnumerable, then you are limiting them to iterating -- they can't add or remove items from your object, they can only act against the objects. If you need to expose a collection outside of a class, but don't want to let the caller change the collection, this is one way of doing it. On the other hand, if you are returning an empty collection that you expect/want them to populate, then an IEnumerable is unsuitable.
Here's my answer in this .NET 4.5+ world.
Use IList<T> and IReadonlyList<T>,
instead of List<T>, because ReadonlyList<T> doesn't exist.
IList<T> looks so consistent with IReadonlyList<T>
Use IEnumerable<T> for minimum exposure (property) or requirement (parameter) if foreach is the only way to use it.
Use IReadonlyList<T> if you also need to expose/use Count and [] indexer.
Use IList<T> if you also allow callers to add/update/delete elements
because List<T> implements IReadonlyList<T>, it doesn't need any explicit casting.
An example class:
// manipulate the list within the class
private List<int> _numbers;
// callers can add/update/remove elements, but cannot reassign a new list to this property
public IList<int> Numbers { get { return _numbers; } }
// callers can use: .Count and .ReadonlyNumbers[idx], but cannot add/update/remove elements
public IReadOnlyList<int> ReadonlyNumbers { get { return _numbers; } }
I am writing a method that's intended to return a dictionary filled with configuration keys and values. The method that's building up this dictionary is doing so dynamically, so I need to return this set of keys and values as a collection (probably IDictionary<string, string>). In my various readings (sources escape me at the moment), the general consensus on returning collection types from method calls is not to.
I understand the reasons for this policy, and I tend to agree, but in cases like this I see no other alternative. This is my question: is there a way I can return this data to the caller, while following this principle?
Edit: The reasons I've heard for not allowing this behavior is that a collection or dictionary type that is meant to be consumed (but not modified) by the client exposes too much behavior, giving the illusion that the caller can modify the type. Dictionary for example has Add and Remove methods, as well as a mutable indexer. If the values in the dictionary are meant to be read-only, these methods are superfluous at best. Further damage can be done if the internal collection is exposed, and the 'owner' of the collection is not anticipating changes to the collection from outside sources.
There are other reasons I've heard, but I can't recall them off-hand - these are the most pertinent in my situation.
Edit: More clarification: The problem I'm having is that I'm building an API, so I have no control over the client calling this function. Cloning the dictionary isn't a problem, but I'm trying to keep my API as clean as possible. Returning a dictionary with methods such as Add and Remove implies that the collection can or should be modified, which isn't the case. Modifications here are meaningless, and so I don't want to expose the promise of that functionality through the returned type's interface.
Resolution: To come to terms with my desire for a clean API, I'm going to write a custom Dictionary class that does not expose the mutating methods Add and Remove, or the set indexer. This type will not implement IDictionary, but I will write a method ToDictionary that will return the data within an IDictionary. It will implement IEnumerable<KeyValuePair<TKey, TValue>> in order to have access to the standard LINQ operations over enumerables. Now all I need is a name for my custom dictionary type... =) Thanks everyone.
The general consensus on returning
collection types from method calls is
not to.
First time I've heard this, and it seems a stupid restriction to me.
I understand the reasons for this
policy
Which are they then?
Edit:
The reasons you cite against returning collections are specific potential problems, which can be adresses specifically (by returning a read-only wrapper), without a blanket restriction on returning collections. But as I understand your situation, the collection is actually built by the method - in that case, changes made by the caller will not affect anything else and thus aren't something you really have to worry about, nor should you be overly restrictive in what the caller is supposed to be able to do with the object created specifically for him.
The main reason for this restriction is that it breaks polymorphism, constness and access control, if the class returns a member collection. If you are building up a collection to return, and the class does not retain it as a member, then this is not a problem.
That said, you may wish to think harder about why you wish to return this collection. What do you want the calling class to be able to do with the data? Can you implement this functionality by adding methods to your class, instead of returning a collection (e.g. myobj.getvalueFromKey(s) instead of myobj.getdictionary()[s])? Might it be more appropriate to return an object that only exposes the information you want it to, rather than simply return the collection (e.g. MyLookupTable MyClass::getLookupTable() rather than IDictionary MyClass::getLookupTable()).
If you have no control over the caller, and you must return a collection of a given type, then it should either be a copy of a member collection, or a new collection entirely, that the callee doesn't store.
In my opinion returning collections is only a problem if changing the returned collection can have side effects, eg. several functions work with the same collection.
If you are only creating the collection and not making data from a class public through returning the collection I think it is okay to simply return the dictionary
If the collection is used elsewhere in your code and the code you returned the collection to should not be able to change the collection you have to clone it.
I've never heard that advice. There might be issues with thread safety if you do it poorly, but you can work around that if you need to.
Check out ReadOnlyCollection() for this. Change your return type and your last statement to
return new ReadOnlyCollection(whateverYouWereReturningBefore);
Perhaps the confusion is with readonly collections (i.e. non-mutable collections)? If so, there's an excellent series of posts by Eric Lippert that goes into good detail on how to build these.
To quote: It is much easier to reason about a data structure if you know that it will never change. Since they cannot be modified, they are automatically threadsafe. Since they cannot be modified, you can maintain a stack of past “snapshots” of the structure, and suddenly undo-redo implementations become trivial.
Hows about returning an IEnumerable<T>, the caller can then easily filter the results anyway they like via linq without mutating the original structure.
obviously for a Dictionary this will be IEnumerable<KeyValuePair<T,U>>
Edit: For lookup you presumebly want ToLookup() extension and ILookup
I usually return an array of data vs a collection type. In C#, for example, a lot of the collections implement a .toArray() method, and for those that don't, an array can be retrieved using lambdas.
Edit
Saw your comment to "No Refunds No Returns" answer. If you're returning a dictionary, an array may not work for you. In this case, I would recommend returning an interface rather than a concrete implementation.
In C# (for example):
public IDictionary<string, object> MyMethod()
{
Dictionary<string, object> myDictionary = new Dictionary<string, object>();
// do stuff here
return myDictionary;
}
Edit 2
You may need to implement your own read-only dictionary class and throw an exception in the necessary methods to prevent adding, etc.
In C# (Again, for example) (Not a complete solution):
public class ReadOnlyDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
private IDictionary<TKey, TValue> _innerDictionary;
public ReadOnlyDictionary(IDictionary<TKey, TValue> innerDictionary)
{
this._innerDictionary = innerDictionary;
}
public void Add(TKey key, TValue value)
{
throw new NotImplementedException();
}
public bool Remove(TKey key)
{
throw new NotImplementedException();
}
public void Add(KeyValuePair<TKey, TValue> item)
{
throw new NotImplementedException();
}
public void Clear()
{
throw new NotImplementedException();
}
public bool IsReadOnly
{
get { return true; }
}
public bool Remove(KeyValuePair<TKey, TValue> item)
{
throw new NotImplementedException();
}
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
{
return _innerDictionary.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return _innerDictionary.GetEnumerator();
}
}
The reasons I've heard for not allowing this behavior is that a collection or dictionary type that is meant to be consumed (but not modified) by the client exposes too much behavior, giving the illusion that the caller can modify the type.
But it's not an illusion. The caller can modify the type (well, the instance of the type). Why on earth is this a problem?
By the same logic, DataTable.Select() shouldn't return a DataRow[], since not only can the caller manipulate the membership of that array, it can change the underlying data!
And the idea of returning an immutable dictionary-like class that has a ToDictionary() method: what possible benefit accrues from doing that?
It's true that returning immutable objects makes it possible for you to implement interning without changing your API. But that's about the only advantage that I can think of.
A major problem with using mutable class objects to pass around data is that every mutable object encompasses two major kinds of state:
The contents of all its fields, and the objects refered to thereby.
The set of all references that exist to it, and things that might be done with those references.
If a method accepts a mutable object (be it a collection or something else) as a parameter, and its contract specifies that it will mutate it somehow (e.g. add items to a collection) but not keep any reference to it, then the caller will know that the set of references that exist to that object after the method call will be the same as it was before. If the caller never exposes the object to the outside world except pass it to such methods, tracking what references exist will be easy.
On the other hand, if a method returns a mutable object to the caller, keeping track of what references may exist to the objects that are passed in and out may be difficult or impossible unless every caller receives a different mutable object. Having the called function create a new mutable object each time it's called, and populate that object with data as appropriate, is certainly a workable approach, but it's often better to let the caller create the new object. That way the caller may be able to recycle objects as appropriate (improving performance) and it will be clearer what's going on. For example, if Customer is a mutable class and one does:
Customer myCustomer = Database.GetCustomer("Fred Smith");
it's unclear whether making changes to myCustomer will have any effect on the database. By contrast, if the code were instead written as:
Customer myCustomer = new Customer;
Database.LoadCustomer(myCustomer, "Fred Smith");
it would be clearer that the data within myCustomer is not attached to the database (or anything else).