Pattern for using IEnumerator<T> in interfaces - c#

I have a C# class which needs to process a sequence of items (IEnumerable<T>) across a bunch of methods, so I cannot simply foreach inside a method. I call .GetEnumerator() and pass this IEnumerator<T> around and it works great giving me the flexibility I need while looping through a single sequence.
Now I want to allow others to add logic into this process. The most natural way to do this is give them an interface with a method that accepts the IEnumerator<T>. Easy, done, and it works.
But I'm concerned that this is an anti-pattern. They have to know that the IEnumerator<T> has already had .MoveNext() called, so they can simply access .Current. Plus I don't see any precedent for using IEnumerator<T> in interfaces to be implemented.
What pitfalls am I not considering?
Is there another pattern which will allow me this same efficient mechanism (i.e. I don't want multiple copies being created/destroyed) without exposing the IEnumerator<T> itself?
Update: As I mentioned in a comment below: What I want is some sort of generic Stream<T>. I need to be able to effectively see the next item (IEnumerator.Current -> .Peek()) and consume it (IEnumerator<T>.MoveNext() -> .Pop()).
I used IEnumerator<T> because it fit the bill interface wise. I prefer to use common BCL types when they fit, but it seemed like I was abusing this one.
So question 3) Is there a class which fits this need? Or should I just create my own Stream which lazily executes the IEnumerator<T> internally? Then it would be entirely encapsulated. I'd like to not use many of the existing collections as they have internal storage, whereas I'd like the storage to be the IEnumerable<T> iteslf.
OK it sounds like the consensus is that do to IEnumerator<T> often being a ValueType as well as not knowing a priori the state of the IEnumerator<T>, that it is generally a bad idea to pass it around.
The best suggestion I've heard is to create my own class which gets passed around. Any other suggestions?

You should definitely not pass the IEnumerator<T> around. Apart from anything else, it could have some very bizarre effects in some cases. What would you expect to happen here, for example?
using System;
using System.Collections.Generic;
class Test
{
static void ShowCurrentAndNext(IEnumerator<int> iterator)
{
Console.WriteLine("ShowCurrentAndNext");
Console.WriteLine(iterator.Current);
iterator.MoveNext(); // Let's assume it returns true
Console.WriteLine(iterator.Current);
}
static void Main()
{
List<int> list = new List<int> { 1, 2, 3, 4, 5 };
using (var iterator = list.GetEnumerator())
{
iterator.MoveNext(); // Get things going
ShowCurrentAndNext(iterator);
ShowCurrentAndNext(iterator);
ShowCurrentAndNext(iterator);
}
}
}
A couple of changes to try:
using (List<int>.Enumerator iterator = list.GetEnumerator())
and
using (IEnumerator<int> iterator = list.GetEnumerator())
Try to predict the results in each case :)
Now admittedly that's a particularly evil example, but it does demonstrate some corner cases associated with passing around mutable state. I would strongly encourage you to perform all your iteration in a "central" method which calls into appropriate other methods just with the current value.

I would strongly advise against passing the enumerator itself around; what reason do you have for this, aside from needing the current value?
Unless I'm missing something obvious, I would recommend having your utility functions simply take the type that you're enumerating as a parameter, then have a single outer foreach loop that handles the actual enumeration.
Perhaps you can provide some additional information as to why you've made this design decision so far.

Sounds to me like you might benefit from using an event so that you can push notification of items to be processed out to listeners. Regular .NET events are handled in the order they're subscribed, so you might go for a more explicit approach if ordering is required.
You may also like to look at the Reactive Framework.

If I understand this correctly, you have a number of methods that can all call MoveNext on the sequence and you want these methods to cooperate with each-other, so you pass around an IEnumerator<T>. There's definitely some tight coupling here, as you mentioned, since you expect the enumerator to be in a particular state at the entrance to each method. It sounds like what you're really after here is something like the Stream class, which is both a collection (sort of) and an iterator (has a notion of Current location). I would wrap your iteration and any other state you need in your own class and have the various methods as members of that class

Related

What practices can safeguard against unexpected deferred execution with IEnumerable<T> as argument? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are a few questions similar to this which deals with right input and output types like this. My question is what good practices, method naming, choosing parameter type, or similar can safeguard from deferred execution accidents?
These are most prevalent with IEnumerable which is a very common argument type because:
Follows the robustness principle "Be conservative in what you do, be liberal in what you accept from others"
Used extensively with Linq
IEnumerable is high in the collection hierarchy and predates newer collection types
However, it also introduces deferred execution. Now we might have gone wrong in designing our methods (especially extension methods) when we thought the best idea is to take the most basic type. So our methods looked like:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> lstObject)
{
foreach (T t in lstObject)
//some fisher-yates may be
}
The danger obviously is when we mix the above function with lazy Linq and its so susceptible.
var query = foos.Select(p => p).Where(p => p).OrderBy(p => p); //doesn't execute
//but
var query = foos.Select(p => p).Where(p => p).Shuffle().OrderBy(p => p);
//the second line executes up to a point.
A bigger edit:
Reopening this: a criticism of a language's functionality isn't constructive - however asking for good practices is where StackOverflow shines. Updated the question to reflect this.
A big edit here :
To clarify the above line - My question is not about the second expression not getting evaluated, seriously not. Programmers know it. My worry is about Shuffle method actually executing the query up to that point. See the first query, where nothing gets executed. Now similarly when constructing another Linq expression (which should be executed later), our custom function is playing the spoilsport. In other words, how to let the caller know Shuffle is not the kinda function they would want at that point of Linq expression. I hope the point is driven home. Apologies! :) Though its as simple as going and inspecting the method, I'm asking how do you guys typically program defensively..
The above example may not be that dangerous, but you get the point. That is certain (custom) functions don't go well with the Linq idea of deferred execution. The problem is not just about performance, but also about unexpected side-effects.
But a function like this works magic with Linq:
public static IEnumerable<S> DistinctBy<S, T>(this IEnumerable<S> source,
Func<S, T> keySelector)
{
HashSet<T> seenKeys = new HashSet<T>(); //credits Jon Skeet
foreach (var element in source)
if (seenKeys.Add(keySelector(element)))
yield return element;
}
As you can see both the functions take IEnumerable<>, but the caller wouldn't know how the functions react. So what are the general cautionary measures that you guys take here?
Name our custom methods appropriately so that it gives the idea for the caller that it does bode well or not with Linq?
Move lazy methods to a different namespace, and keep Linq-ish to another, so that it gives some sort of an idea at least?
Do not accept an IEnumerable as parameter for immediately executing methods but instead take a more derived type or a concrete type itself which thus leaves IEnumerable for lazy methods alone? This puts the burden on the caller to do the execution of possible un-executed expressions? This is quite possible for us, since outside Linq world we hardly deal with IEnumerables, and most basic collection classes implement up to ICollection at least.
Or anything else? I particularly like the 3rd option, and that's what I was going with, but thought to get your ideas prior to. I have seen plenty of code (nice little Linq like extension methods!) from even good programmers that accept IEnumerable and do a ToList() or something similar on them inside the method. I don't know how they cope with the side-effects..
Edit: After a downvote and an answer, I would like to clarify that its not about programmers not knowing about how Linq works (our proficiency could be at some level, but thats a different thing), but its that many functions were written not taking Linq into account back then. Now chaining an immediately executing method along with Linq extension methods make it dangerous. So my question is there a general guideline programmers follow to let the caller know what to use from Linq side and what not to? It's more about programming defensively than if-you-don't-know-to-use-it-then-we-can't-help! (or at least I believe)..
As you can see both the functions take IEnumerable<>, but the caller wouldn't know how the functions react.
That's simply a matter of documentation. Look at the documentation for DistinctBy in MoreLINQ, which includes:
This operator uses deferred execution and streams the results, although
a set of already-seen keys is retained. If a key is seen multiple times,
only the first element with that key is returned.
Yes, it's important to know what a member does before you use it, and for things accepting/returning any kind of collection, there are various important things to know:
Will the collection be read immediately, or deferred?
Will the collection be streamed while results are returned?
If the declared collection type accepted is mutable, will the method try to mutate it?
If the declared collection type returned is mutable, will it actually be a mutable implementation?
Will the collection returned be changed by other actions (e.g. is it a read-only view on a collection which may be modified within the class)
Is null an acceptable input value?
Is null an acceptable element value?
Will the method ever return null?
All of these things are worth considering - and most of them were worth considering long before LINQ.
The moral is really, "Make sure you know how something behaves before you call it." That was true before LINQ, and LINQ hasn't changed it. It's just introduced two possibilities (deferred execution and streaming results) which were rarely present before.
Use IEnumerable wherever it makes sense, and code defensively.
As SLaks pointed out in a comment, deferred execution has been possible with IEnumerable since the beginning, and since C# 2.0 introduced the yield statement, it's been very easy to implement deferred execution yourself. For example, this method returns an IEnumerable that uses deferred execution to return some random numbers:
public static IEnumerable<int> RandomSequence(int length)
{
Random rng = new Random();
for (int i = 0; i < length; i++) {
Console.WriteLine("deferred execution!");
yield return rng.Next();
}
}
So whenever you use foreach to loop over an IEnumerable, you have to assume that anything could happen in between iterations. It could even throw an exception, so you may want to put the foreach loop inside a try/finally.
If the caller passes in an IEnumerable that does something dangerous or never stops returning numbers (an infinite sequence), it's not your fault. You don't have to detect it and throw an error; just add enough exception handlers so that your method can clean up after itself in the event something goes wrong. In the case of something simple like Shuffle, there's nothing to do; just let the caller deal with the exception.
In the rare case that your method really can't deal with an infinite sequence, consider accepting a different type like IList. But even IList won't protect you from deferred execution - you don't know what class is implementing IList or what sort of voodoo it's doing to come up with each element! In the super-rare case that you really can't allow any unexpected code to run while you iterate, you should be accepting an array, not any kind of interface.
Deferred execution has nothing to do with types. Any linq method that uses iterators has potential for deferred execution if you write your code that way. Select(), Where(), OrderByDescending() for e.g. all use iterators and hence defer execution. Yes those methods expect an IEnumerable<T>, but that doesn't mean that IEnumerable<T> is the problem.
That is certain (custom) functions don't go well with the Linq idea of
deferred execution. The problem is not just about performance, but
also about unexpected side-effects.
So what are the general cautionary measures that you guys take here?
None. Honestly we use IEnumerable everywhere and don't have the problem of people not understanding "side effects". "the Linq idea of deferred execution" is central to its usefulness in things like Linq-to-SQL. It sounds to me like the design of the custom functions is not as clear as it could be. If people are writing code to use LINQ and they don't understand what it's doing, then that is the issue, not the fact that IEnumerable happens to be a base type.
All of your ideas are just wrappers around the fact that it sounds like you have programmers that just don't understand linq queries. If you don't need lazy execution, which it sounds like you don't, then just force everything to evaluate before the functions exit. Call ToList() on your results and return them in a consistent API that the consumer would like to work with - lists, arrays, collections or IEnumerables.

Why does Iterator define the remove() operation?

In C#, the IEnumerator interface defines a way to traverse a collection and look at the elements. I think this is tremendously useful because if you pass IEnumerable<T> to a method, it's not going to modify the original source.
However, in Java, Iterator defines the remove operation to (optionally!) allow deleting elements. There's no advantage in passing Iterable<T> to a method because that method can still modify the original collection.
remove's optionalness is an example of the refused bequest smell, but ignoring that (already discussed here) I'd be interested in the design decisions that prompted a remove event to be implemented on the interface.
What are the design decisions that led to remove being added to Iterator?
To put another way, what is the C# design decision that explicitly doesn't have remove defined on IEnumerator?
Iterator is able to remove elements during iteration. You cannot iterate collection using iterator and remove elements from target collection using remove() method of that collection. You will get ConcurrentModificationException on next call of Iterator.next() because iterator cannot know how exactly the collection was changed and cannot know how to continue to iterate.
When you are using remove() of iterator it knows how the collection was changed. Moreover actually you cannot remove any element of collection but only the current one. This simplifies continuation of iterating.
Concerning to advantages of passing iterator or Iterable: you can always use Collection.unmodifireableSet() or Collection.unmodifireableList() to prevent modification of your collection.
It is probably due to the fact that removing items from a collection while iterating over it has always been a cause for bugs and strange behaviour. From reading the documentation it would suggest that Java enforces at runtime remove() is only called once per call to next() which makes me think it has just been added to prevent people messing up removing data from a list when iterating over it.
There are situations where you want to be able to remove elements using the iterator because it is the most efficient way to do it. For example, when traversing a linked data structure (e.g. a linked list), removing using the iterator is an O(1) operation ... compared to O(N) via the List.remove() operations.
And of course, many collections are designed so that modifying the collection during a collection by any other means than Iterator.remove() will result in a ConcurrentModificationException.
If you have a situation where you don't want to allow modification via a collection iterator, wrapping it using Collection.unmodifiableXxxx and using it's iterator will have the desired effect. Alternatively, I think that Apache Commons provides a simple unmodifiable iterator wrapper.
By the way IEnumerable suffers from the same "smell" as Iterator. Take a look at the reset() method. I was also curious as to how the C# LinkedList class deals with the O(N) remove problem. It appears that it does this by exposing the internals of the list ... in the form of the First and Last properties whose values are LinkedListNode references. That violates another design principle ... and is (IMO) far more dangerous than Iterator.remove().
This is actually an awesome feature of Java. As you may well know, when iterating through a list in .NET to remove elements (of which there are a number of use cases for) you only have two options.
var listToRemove = new List<T>(originalList);
foreach (var item in originalList)
{
...
if (...)
{
listToRemove.Add(item)
}
...
}
foreach (var item in listToRemove)
{
originalList.Remove(item);
}
or
var iterationList = new List<T>(originalList);
for (int i = 0; i < iterationList.Count; i++)
{
...
if (...)
{
originalList.RemoveAt(i);
}
...
}
Now, I prefer the second, but with Java I don't need all of that because while I'm on an item I can remove it and yet the iteration will continue! Honestly, though it may seem out of place, it's really an optimization in a lot of ways.

IEnumerable<T> as return type

Is there a problem with using IEnumerable<T> as a return type?
FxCop complains about returning List<T> (it advises returning Collection<T> instead).
Well, I've always been guided by a rule "accept the least you can, but return the maximum."
From this point of view, returning IEnumerable<T> is a bad thing, but what should I do when I want to use "lazy retrieval"? Also, the yield keyword is such a goodie.
This is really a two part question.
1) Is there inherently anything wrong with returning an IEnumerable<T>
No nothing at all. In fact if you are using C# iterators this is the expected behavior. Converting it to a List<T> or another collection class pre-emptively is not a good idea. Doing so is making an assumption on the usage pattern by your caller. I find it's not a good idea to assume anything about the caller. They may have good reasons why they want an IEnumerable<T>. Perhaps they want to convert it to a completely different collection hierarchy (in which case a conversion to List is wasted).
2) Are there any circumstances where it may be preferable to return something other than IEnumerable<T>?
Yes. While it's not a great idea to assume much about your callers, it's perfectly okay to make decisions based on your own behavior. Imagine a scenario where you had a multi-threaded object which was queueing up requests into an object that was constantly being updated. In this case returning a raw IEnumerable<T> is irresponsible. As soon as the collection is modified the enumerable is invalidated and will cause an execption to occur. Instead you could take a snapshot of the structure and return that value. Say in a List<T> form. In this case I would just return the object as the direct structure (or interface).
This is certainly the rarer case though.
No, IEnumerable<T> is a good thing to return here, since all you are promising is "a sequence of (typed) values". Ideal for LINQ etc, and perfectly usable.
The caller can easily put this data into a list (or whatever) - especially with LINQ (ToList, ToArray, etc).
This approach allows you to lazily spool back values, rather than having to buffer all the data. Definitely a goodie. I wrote-up another useful IEnumerable<T> trick the other day, too.
IEnumerable is fine by me but it has some drawbacks. The client has to enumerate to get the results. It has no way to check for Count etc.
List is bad because you expose too much control; the client can add/remove etc. from it and that can be a bad thing.
Collection seems the best compromise, at least in FxCop's view.
I always use what seems appropiate in my context (eg. if I want to return a read only collection I expose collection as return type and return List.AsReadOnly() or IEnumerable for lazy evaluation through yield etc.). Take it on a case by case basis
About your principle: "accept the least you can, but return the maximum".
The key to managing the complexity of a large program is a technique called information hiding. If your method works by building a List<T>, it's not often necessary to reveal this fact by returning that type. If you do, then your callers may modify the list they get back. This removes your ability to do caching, or lazy iteration with yield return.
So a better principle is for a function to follow is: "reveal as little as possible about how you work".
Returning IEnumerable<T> is OK if you're genuinely only returning an enumeration, and it will be consumed by your caller as such.
But as others point out, it has the drawback that the caller may need to enumerate if he needs any other info (for example Count). The .NET 3.5 extension method IEnumerable<T>.Count will enumerate behind the scenes if the return value does not implement ICollection<T>, which may be undesirable.
I often return IList<T> or ICollection<T> when the result is a collection - internally your method can use a List<T> and either return it as-is, or return List<T>.AsReadOnly if you want to protect against modification (e.g. if you're caching the list internally). AFAIK FxCop is quite happy with either of these.
"accept the least you can, but return the maximum" is what I advocate. When a method returns an object, what justifications we have to not return the actual type and limit the capabilities of the object by returning a base type. This however raises a question how do we know what the "maximum" (actual type) will be when we design an interface. The answer is very simple. Only in extreme cases where the interface designer is designing an open interface, which will be implemented outside the application/component, they would not know what the actual return type may be. A smart designer should always consider what the method should be doing and what an optimal/generic return type should be.
E.g. If I am designing an interface to retrieve a vector of objects, and I know the count of returned objects are going to be variable, I'll always assume a smart developer will always use a List. If someone plans to return an Array, I'd question his capabilities, unless he/she is just returning the data from another layer that he/she doesn't own. And this is probably why FxCop advocates for ICollection (common base for List and Array).
The above being said, there are couple of other things to consider
if the returned data should be mutable or immutable
if the returned data be shared across multiple callers
Regarding the LINQ lazy evaluations I am sure 95%+ C# users don't understand the intestacies. It’s so non-oo-ish. OO promotes concrete state changes on method invocations. LINQ lazy evaluation promotes runtime state changes on expression evaluation pattern (not something non-advanced users always follow).
One important aspect is that when you return a List<T> you are actual returning a reference. That makes it possible for a caller to manipulate your list. This is a common problem—for instance, a Business layer that returns a List<T> to a GUI layer.
Just because you say you're returning IEnumerable doesn't mean you can't return a List. The idea is to reduce unneeded coupling. All that the caller should care about is getting a list of things, rather than the exact type of collection used to contain that list. If you have something that's backed by an array, then getting something like Count is going to be fast anyway.
I think your own guidance is great -- if you are able to be more specific about what you're returning without a performance hit (you don't have to e.g. build a List out of your result), do so. But if your function legitimately doesn't know what type it's going to find, like if in some situations you'll be working with a List and in some with an Array, etc., then returning IEnumerable is the "best" you can do. Think of it as the "greatest common multiple" of everything you might want to return.
I can't accept the chosen answer. There are ways of dealing with the scenario described but using a List or whatever else your using isn't one of them. The moment the IEnumerable is returned you have to assume that the caller might do a foreach. In that case it doesn't matter if the concrete type is List or spaghetti. In fact just indexing is a problem especially if items are removed.
Any returned value is a snapshot. It may be the current contents of the IEnumerable in which case if it's cached it should be a clone of the cached copy; if it's supposed to be more dynamic (like the resuts of a sql query) then use yield return; however allowing the container to mutate at will and supplying methods like Count and indexer is a recipe for disaster in a multithreaded world. I haven't even gotten into the ability of the caller to call Add or Delete on a container your code is supposed to be in control of.
Also returning a concrete type locks you into an implementation. Today internally you may be using a list. Tomorrow maybe you do become multithreaded and want to use a thread safe container or an array or a queue or the Values collection of a dictionary or the output of a Linq query. If you lock yourself into a concrete return type then you have to either change a bunch of code or do a conversions before returning.
IEnumerable is cool because you can use the yield iterator that gives to the consumer just the data they need but there is a cost hidden in the construct.
Let me explain it with an example. Let's say I am consuming this method:
IEnumerable GetFilesFromFolder(string path)
So, what do I get? To get all the files of my folder I have to iterate the enumeration, and that's fine, after all that's how enumerations work, but what if, for any reason, I have to enumerate it twice?
The second time should I expect a refreshed result or the result is idempotent? I do not know. I have to check the docs of the library / method.
The call to the GetEnumerator method of the enumeration done by the consumer, could, in fact, execute an I/O operation behind the scene, or an http call, or it could simply iterate an inner array, I can not know it for sure. I have to check the docs in the hope that this behavior is documented.
Does this detail matters? I think it does. At least from a performance perspective.
Even if the cost of iterations is slow and CPU bounded, this is not zero, and it could go even worse in the scenario of chains of enumerations, that often turn debugging sessions a nightmare.
I prefer to not give the consumer of my library doubts so whenever I know my API returns few elements I always use arrays as return type, and only when the data to return is huge I use IEnumerable or IAsyncEnumerable.
Anyway, if you want to return enumerations please document your API to tell consumers if the result is a snapshot or not.

How do you design an enumerator that returns (theoretically) an infinite amount of items?

I'm writing code that looks similar to this:
public IEnumerable<T> Unfold<T>(this T seed)
{
while (true)
{
yield return [next (T)object in custom sequence];
}
}
Obviously, this method is never going to return. (The C# compiler silently allows this, while R# gives me the warning "Function never returns".)
Generally speaking, is it bad design to provide an enumerator that returns an infinite number of items, without supplying a way to stop enumerating?
Are there any special considerations for this scenario? Mem? Perf? Other gotchas?
If we always supply an exit condition, which are the options? E.g:
an object of type T that represents the inclusive or exclusive boundary
a Predicate<T> continue (as TakeWhile does)
a count (as Take does)
...
Should we rely on users calling Take(...) / TakeWhile(...) after Unfold(...)? (Maybe the preferred option, since it leverages existing Linq knowledge.)
Would you answer this question differently if the code was going to be published in a public API, either as-is (generic) or as a specific implementation of this pattern?
So long as you document very clearly that the method will never finish iterating (the method itself returns very quickly, of course) then I think it's fine. Indeed, it can make some algorithms much neater. I don't believe there are any significant memory/perf implications - although if you refer to an "expensive" object within your iterator, that reference will be captured.
There are always ways of abusing APIs: so long as your docs are clear, I think it's fine.
"Generally speaking, is it bad desing
to provide an enumerator that returns
an infinite amount of items, without
supplying a way to stop enumerating?"
The consumer of the code, can always stop enumerating (using break for example or other means). If your enumerator returns and infinite sequence, that doesn't mean the client of the enumerator is somehow forced to never break enumeration, actually you can't make an enumerator which is guaranteed to be fully enumerated by a client.
Should we rely on users calling
Take(...) / TakeWhile(...) after
Unfold(...)? (Maybe the preferred
option, since it leverages existing
Linq knowledge.)
Yes, as long as you clearly specify in your documentation that the enumerator returns and infinite sequence and breaking of enumeration is the caller's responsibility, everything should be fine.
Returning infinite sequences isn't a bad idea, functional programing languages have done it for a long time now.
I agree with Jon. Compiler transforms your method to class implementing simple state machine that keeps reference to current value (i.e. value that will be returned via Current property). I used this approach several times to simplify code. If you clearly document method's behavior it should work just fine.
I would not use an infinite enumerator in a public API. C# programmers, myself included, are too used to the foreach loop. This would also be consistent with the .NET Framework; notice how the Enumerable.Range and Enumerable.Repeat methods take an argument to limit the number of items in the Enumerable. Microsoft chose to use Enumerable.Repeat(" ", 10) instead of Enumerable.Repeat(" ").Take(10) to avoid the infinite enumeration and I would adhere to their design choices.

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?
If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.
I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.
If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.
I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.
One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.
You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.
You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Categories

Resources