Return collection as read-only - c#

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?

If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.

I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.

If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.

I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.

One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.

You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.

You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Related

Should I return a collection when the reference to the collection is not changed?

I got a method which accepts a collection as below
public IList<CountryDto> ApplyDefaults(IList<CountryDto> dtos)
{
//Iterates the collection
//Validates the items in collection
//If items are invalid
//Removes items e.g dtos.Remove(currentCountryDto)
return dtos;//Do I need to do this?
}
My question is since, the reference to the collection is not changed, should I return the collection again from the method?
For: By returning the collection back, I make it explicit in the signature and user is aware that the items in the collection could be different from the original source. Sort of it avoid ambiguity.
Against: Since the validation doesnt change the reference of the collection, it doesn't make sense technically to return it.
What is the best approach in this case?
Note: I am not sure if this question is opinion based. I think probably I missing something here on design side.
In every programming language consistency of your own code / library with the approach of the core libraries is of high value. Hence, inspecting how Collections.sort() or Collection.swap() and Collections.shuffle() are defined, I would suggest to not return the input parameter, if you intend to modify it. In addition, your method should be named in such a way, that it is obvious the input parameter gets modified. Otherwise your method will be considered to have side-effects.
Returning a value most often suggests that it is a new instance which reflects the work, performed by the method or is used for method-chaining in case of builders.
Given your comments/requirements:
Does not need to report if defaults are applied.
ApplyDefaults is complicated and invoking other services and not intended to produce a fluent API
ApplyDefaults is a "black box"; validation logic is injected so the calling code doesn't know/care about the validation
I think based on these, this method definitely should not return the reference to the incoming list, even if no validation is applied. Firstly, unless the API is clearly built around method chaining (which you indicated you do not want), returning a List<T> type usually indicates a new List is being created. Secondly, if a new list is not created, users may find themselves modifying the list in ways they didn't expect.
Consider:
IList<CountryDto> originalCountries = Service.GetCountries();
IList<CountryDto> validatedCountries = ApplyDefaults(originalCountries);
validatedCountries.Add(mySpecialCountry);
OutputOriginalCountries(originalCountries);
OutputValidatedCountries(validatedCountries);
This code isn't very special, and a fairly common pattern. If ApplyDefaults returned a reference to the same originalCountries collection, then mySpecialCountry would also be added to originalCountries. This would violate the Principle of Least Astonishment.
This would be exacerbated if this behaviour changed depending on whether or not items were validated/filtered. Since the validation logic is a black-box of behaviour that the caller doesn't know or care about, the API consumer could not depend on whether or not it returned the same reference. They would either have to do their own reference check (e.g., if (myValidatedCountries == myInputCountries)), or simply make a copy every time. Regardless, this becomes another weird behaviour that the programmer has to juggle when working with the API.
I think that the method should either:
A) always return a copied list with the items filtered out (public IList<CountryDto> ApplyDefaults(IEnumerable<CountryDto> dtos))
B) modify the incoming list in-place (public void ApplyDefaults(IList<CountryDto> dtos))
For option A, depending on the size of your list, this incurs the possible unnecessary work of creating a copied list every time even if no filtering is performed. However, the validation/filtering logic might be simpler. You might be able to use LINQ queries to apply the filtering nicely. Additionally, removing items from a list is generally costly as it has to rebuild the internal array. So it might actually be faster to build a new list. You may even simplify the signature here to be IEnumerable<CountryDto>; this allows for wider usage and is extremely obvious that you're creating a new collection.
For option B, if no validation is required, then no work is done and the method is essentially "free" (no array rebuilding, no copying, no reference changes). But if there is significant validation, the removal aspect may be costly. Since you're not method chaining, this version should have a void return type as it's much more obvious to the developer that this is modifying the list in-place. This follows other commonly known methods like List<T>.Sort. Furthermore, if a user wants to have a separate originalCountries and validatedCountries they can always make a copy:
var validatedCountries = originalCountries.ToList();
ApplyDefaults(validatedCountries);
Ultimately, which one you choose might depend on performance. If validation/removal is cheap and rare, then modifying the list in-place might be best. If you're expecting a lot of changes to the list, it might simply be faster to produce a new copy every time.
Regardless, I would suggest you name the method with a little more clarity as well. For example:
public IList<CountryDto> GetValidCountries(IEnumerable<CountryDto> dtos)
public void RemoveInvalidCountries(IList<CountryDto> dtos)
Of course, the naming might be different depending on your actual code context (I suspect ApplyDefaults is a common/inherited method name and not specific to CountryDto)
I'd rather return boolean (or enum in an elaborated case: collection preserved intact,
changed, can't be validated etc.)
// true if the collection is changed, false otherwise
public Boolean ApplyDefaults(IList<CountryDto> dtos) {
Boolean result = false;
//Iterates the collection
//Validates the items in collection
//If items are invalid:
// Removes items e.g dtos.Remove(currentCountryDto)
// result = true;
...
return result;
}
...
if (ApplyDefaults(myData)) {
// Collection is changed, do some extra stuff
}
First of all: you cannot change the reference of the collection you send by parameter, because by default you're getting copy of it. You'd need to use a ref keyword in order to be able to change it.
Secondly: if your method has a return type, than it has to return an object. Your method is not called GetNewCollectionWithAppliedDefaults, but ApplyDefaults which implies that the collection will be modified. You should either return boolean true/false to inform user changes were done or always return parameter's collecion (to allow nested methods calling).
Also, why would you think it doesn't make sense to return a collection? I'd say there's no argument against it. Turn the question around: "why wouldn't I return the collection and could it harm my code"?
Technically, I would say there is not much difference between the two.
However, and as you pointed out, a common used convention is that a function should only return an object it creates. Basically, that would mean that a function that returns an object is generating one while a function which doesn't return anything is modifying the object passed as a parameter.
Again, this is only a convention and it is not widely used within the C# community, but in the python community for example, it is.
Some people, returns a Boolean (or an error code) instead as an indicator of an error (like the old dos command line). I don't like this approach and prefer by far raising exceptions that I can handle later on.
Finally, the best approach in my regard, is to return a value that indicates if a change was done by the function and eventually a value indicating how much of a change was done. It can be a Boolean or it can be the number of inserted/removed elements...
In any case, try to be consistent with the approach you chose, if not in all your code, at least within a single project. Sometimes, you will have no other choice but to abide with the convention used by your teammates.
(My answer is based on the Java viewpoint; C++ and C# programmers might have a different take.) I think it's best to return the collection. The fact that the collection you're returning is the same collection that was given is just an implementation detail, and in future versions of the code, you might want to change that. Document that the collection returned might not be the same one passed in.
If, on the other hand, you want to lock in the design that this method modifies a collection in place, document it that way and don't return the collection. I prefer not to do it this way, but I can see advantages in some contexts.
In your case I would leave void since ApplyDefaults clearly states what its doing.
Also, it might be a good idea to ApplyDefaults in the collection itself. Subclass IList or List or whatever and then you'd call like this:
myCollection.ApplyDefaults();
Which is just obvious.

Why would one use Stack<T> instead of List<T>?

List<T> from System.Collections.Generic does everything Stack<T> does, and more -- they're based on the same underlying data structure. Under what conditions is it correct to choose Stack<T> instead?
You would use stack if you had a need for a Last In First Out collection of items. A list will allow you to access it's items at any index. There are a lot of other differences but I would say this is the most fundamental.
Update after your comment:
I would say that using Stack<T> makes a statement about how you want this code to be used. It's always good to plan for the future, but if you have a need for Stack<T> right now, and no compelling reason to use List<T> then I would go with Stack<T>
why I would artificially limit myself to using Stack in new code
There's your answer - you should use Stack when you have a need to enforce a contractual expectation that the data structure being used can only be operated on as a stack. Of course, the times you really want to do that are limited, but it's an important tool when appropriate.
For example, supposed the data being worked with doesn't make any sense unless the stack order is enforced. In those cases, you'd be opening yourself up to trouble if you made the data available as a list. By using a Stack (or a Queue, or any other order-sensitive structure) you can specify in your code exactly how the data is supposed to be used.
Well, you would want to use Stack if you were logically trying to represent a stack. It will convey the intention of the programmer throughout the code if you use a stack, and it will prevent in-advertant mis-use of the data structure (unintentionally adding/removing/reading somewhere other than one end).
It's certainly possible that, rather than a concrete implementation, Stack could just be an interface. You could then have something like List implement that interface. The problem there is mostly a matter of convenience. If someone needs a stack they need to pick some specific implementation and remember ("Oh yeah, List is the preferred stack implementation") rather than just newing up the concrete type.
It's all about concept. A List is a List, and a Stack is a Stack, and they do two very different things. Their only commonality is their generic nature and their variable length.
A List is a variable-length collection of items in which any element can be accessed and overwritten by index, and to which items can be added and from which items can be removed at any such index.
A Stack is a variable-length collection of items supporting a LIFO access model; only the top element of the Stack can be accessed, and elements can be added to and removed from only that "endpoint" of the collection. The item 3 elements from the "top" can only be accessed by "popping" the two elements above it to expose it.
Use the correct tool for the job; use a List when you need "random" access to any element in the collection. Use a Stack when you want to enforce the more limited "top-only" access to elements in the array. Use a Queue when you want to enforce a FIFO "pipeline"; items go in one end, out the other.
Besides being conceptually different, as the other the answers already point out, there are also different methods in the Stack that make your code cleaner (and your life easier) than the correspondent code when using a List.
For example, a simple object pool snippet when using a Stack:
if (!pool.TryPop(out var obj))
{
obj = new Foo();
}
And the same written as a List while keeping the operation O(1):
Foo obj;
int count = pool.Count;
if (count > 0)
{
obj = pool[--count];
pool.RemoveAt(count);
}
else
{
obj = new Foo();
}
System.Collections.Generic.Stack<T> is a LIFO (Last-In, First-Out) data structure aka a stack.
Despite its name, SCG.List<T> is not the abstract data type known as a [linked] list: it is, in fact, a variable-length array.
Two very different creatures.

Collection properties should be read only - Loophole?

In the process of adhering to code analysis errors, I'm changing my properties to have private setters. Then I started trying to understand why a bit more. From some research, MS says this:
A writable collection property allows a user to replace the collection with a completely different collection.
And the answer, here, states:
Adding a public setter on a List<T> object is dangerous.
But the reason why it's dangerous is not listed. And that's the part where I'm curious.
If we have this collection:
public List<Foo> Foos { get; set; }
Why make the setter private? Apparently we don't want client code to replace the collection, but if a client can remove every element, and then add whatever they want, what's the point? Is that not the same as replacing the collection entirely? How is value provided by following this code analysis rule?
Not exposing the setter prevents a situation where the collection is assigned a value of null. There's a difference between null and a collection without any values. Consider:
for (var value in this.myCollection){ // do something
When there are no values (i.e. someone has called Remove on every value), nothing bad happens. When this.myCollection is null, however, a NullReferenceException will be thrown.
Code Analysis is making the assumption that your code doesn't check that myCollection is null before operating on it.
It's probably also an additional safeguard for the thread-safe collection types defined in System.Collections.Concurrent. Imagine some thread trying to replace the entire collection by overwritting it. By getting rid of the public setter, the only option the thread has is to call the thread-safe Add and Remove methods.
If you're exposing an IList (which would be better practice) the consumer could replace the collection with an entirely different class that implements IList, which could have unpredictable effects. You could have subscribed to events on that collection, or on items in that collection that you're now incorrectly responding to.
In addition to SimpleCoder's null checking (which is, of course, important), there's other things you need to consider.
Someone could replace the List, causing big problems in thread safety
Events to a replaced List won't be sent to subscribers of the old one
You're exposing much, much more behavior then you need to. For example, I wouldn't even make the getter public.
To clarify point 3, don't do cust.Orders.clear(), but make a function called clearOrders() instead.
What if a customer isn't allowed to go over a credit limit? You have no control over that if you expose the list. You'd have to check that (and every other piece of business logic) every place where you might add an order. Yikes! That's a lot of potential for bugs. Instead, you can place it all in an addOrder(Order o) function and be right as rain.
For almost every (I'd say every, but sometimes cheating feels good...) business class, every property should be private for get and set, and if feasible make them readonly too. In this way, users of your class get only behaviors. Protect as much of your data as you can!
ReadOnlyCollection and ReadOnlyObservableCollection exists only for read only collection scenearios.
ReadOnlyObservableCollection is very useful for one way binding in WPF/Silverlight/Metro apps.
If you have a Customer class with a List Property then this property should always have a private setter else it can be changed from outside the customer object via:
customer.Orders = new List<Order>
//this could overwrite data.
Always use the add and remove methods of the collection.
The Orders List should be created inside the Customer constructor via:
Orders = new List<Order>();
Do you really want to check everywhere in your code wether the customer.Orders != null then operate on the Orders?
Or you create the Orders property in your customer object as suggested and never check for customer.Orders == null instead just enumerate the Orders, if its count is zero nothing happens...

IEnumerable<T> as return type

Is there a problem with using IEnumerable<T> as a return type?
FxCop complains about returning List<T> (it advises returning Collection<T> instead).
Well, I've always been guided by a rule "accept the least you can, but return the maximum."
From this point of view, returning IEnumerable<T> is a bad thing, but what should I do when I want to use "lazy retrieval"? Also, the yield keyword is such a goodie.
This is really a two part question.
1) Is there inherently anything wrong with returning an IEnumerable<T>
No nothing at all. In fact if you are using C# iterators this is the expected behavior. Converting it to a List<T> or another collection class pre-emptively is not a good idea. Doing so is making an assumption on the usage pattern by your caller. I find it's not a good idea to assume anything about the caller. They may have good reasons why they want an IEnumerable<T>. Perhaps they want to convert it to a completely different collection hierarchy (in which case a conversion to List is wasted).
2) Are there any circumstances where it may be preferable to return something other than IEnumerable<T>?
Yes. While it's not a great idea to assume much about your callers, it's perfectly okay to make decisions based on your own behavior. Imagine a scenario where you had a multi-threaded object which was queueing up requests into an object that was constantly being updated. In this case returning a raw IEnumerable<T> is irresponsible. As soon as the collection is modified the enumerable is invalidated and will cause an execption to occur. Instead you could take a snapshot of the structure and return that value. Say in a List<T> form. In this case I would just return the object as the direct structure (or interface).
This is certainly the rarer case though.
No, IEnumerable<T> is a good thing to return here, since all you are promising is "a sequence of (typed) values". Ideal for LINQ etc, and perfectly usable.
The caller can easily put this data into a list (or whatever) - especially with LINQ (ToList, ToArray, etc).
This approach allows you to lazily spool back values, rather than having to buffer all the data. Definitely a goodie. I wrote-up another useful IEnumerable<T> trick the other day, too.
IEnumerable is fine by me but it has some drawbacks. The client has to enumerate to get the results. It has no way to check for Count etc.
List is bad because you expose too much control; the client can add/remove etc. from it and that can be a bad thing.
Collection seems the best compromise, at least in FxCop's view.
I always use what seems appropiate in my context (eg. if I want to return a read only collection I expose collection as return type and return List.AsReadOnly() or IEnumerable for lazy evaluation through yield etc.). Take it on a case by case basis
About your principle: "accept the least you can, but return the maximum".
The key to managing the complexity of a large program is a technique called information hiding. If your method works by building a List<T>, it's not often necessary to reveal this fact by returning that type. If you do, then your callers may modify the list they get back. This removes your ability to do caching, or lazy iteration with yield return.
So a better principle is for a function to follow is: "reveal as little as possible about how you work".
Returning IEnumerable<T> is OK if you're genuinely only returning an enumeration, and it will be consumed by your caller as such.
But as others point out, it has the drawback that the caller may need to enumerate if he needs any other info (for example Count). The .NET 3.5 extension method IEnumerable<T>.Count will enumerate behind the scenes if the return value does not implement ICollection<T>, which may be undesirable.
I often return IList<T> or ICollection<T> when the result is a collection - internally your method can use a List<T> and either return it as-is, or return List<T>.AsReadOnly if you want to protect against modification (e.g. if you're caching the list internally). AFAIK FxCop is quite happy with either of these.
"accept the least you can, but return the maximum" is what I advocate. When a method returns an object, what justifications we have to not return the actual type and limit the capabilities of the object by returning a base type. This however raises a question how do we know what the "maximum" (actual type) will be when we design an interface. The answer is very simple. Only in extreme cases where the interface designer is designing an open interface, which will be implemented outside the application/component, they would not know what the actual return type may be. A smart designer should always consider what the method should be doing and what an optimal/generic return type should be.
E.g. If I am designing an interface to retrieve a vector of objects, and I know the count of returned objects are going to be variable, I'll always assume a smart developer will always use a List. If someone plans to return an Array, I'd question his capabilities, unless he/she is just returning the data from another layer that he/she doesn't own. And this is probably why FxCop advocates for ICollection (common base for List and Array).
The above being said, there are couple of other things to consider
if the returned data should be mutable or immutable
if the returned data be shared across multiple callers
Regarding the LINQ lazy evaluations I am sure 95%+ C# users don't understand the intestacies. It’s so non-oo-ish. OO promotes concrete state changes on method invocations. LINQ lazy evaluation promotes runtime state changes on expression evaluation pattern (not something non-advanced users always follow).
One important aspect is that when you return a List<T> you are actual returning a reference. That makes it possible for a caller to manipulate your list. This is a common problem—for instance, a Business layer that returns a List<T> to a GUI layer.
Just because you say you're returning IEnumerable doesn't mean you can't return a List. The idea is to reduce unneeded coupling. All that the caller should care about is getting a list of things, rather than the exact type of collection used to contain that list. If you have something that's backed by an array, then getting something like Count is going to be fast anyway.
I think your own guidance is great -- if you are able to be more specific about what you're returning without a performance hit (you don't have to e.g. build a List out of your result), do so. But if your function legitimately doesn't know what type it's going to find, like if in some situations you'll be working with a List and in some with an Array, etc., then returning IEnumerable is the "best" you can do. Think of it as the "greatest common multiple" of everything you might want to return.
I can't accept the chosen answer. There are ways of dealing with the scenario described but using a List or whatever else your using isn't one of them. The moment the IEnumerable is returned you have to assume that the caller might do a foreach. In that case it doesn't matter if the concrete type is List or spaghetti. In fact just indexing is a problem especially if items are removed.
Any returned value is a snapshot. It may be the current contents of the IEnumerable in which case if it's cached it should be a clone of the cached copy; if it's supposed to be more dynamic (like the resuts of a sql query) then use yield return; however allowing the container to mutate at will and supplying methods like Count and indexer is a recipe for disaster in a multithreaded world. I haven't even gotten into the ability of the caller to call Add or Delete on a container your code is supposed to be in control of.
Also returning a concrete type locks you into an implementation. Today internally you may be using a list. Tomorrow maybe you do become multithreaded and want to use a thread safe container or an array or a queue or the Values collection of a dictionary or the output of a Linq query. If you lock yourself into a concrete return type then you have to either change a bunch of code or do a conversions before returning.
IEnumerable is cool because you can use the yield iterator that gives to the consumer just the data they need but there is a cost hidden in the construct.
Let me explain it with an example. Let's say I am consuming this method:
IEnumerable GetFilesFromFolder(string path)
So, what do I get? To get all the files of my folder I have to iterate the enumeration, and that's fine, after all that's how enumerations work, but what if, for any reason, I have to enumerate it twice?
The second time should I expect a refreshed result or the result is idempotent? I do not know. I have to check the docs of the library / method.
The call to the GetEnumerator method of the enumeration done by the consumer, could, in fact, execute an I/O operation behind the scene, or an http call, or it could simply iterate an inner array, I can not know it for sure. I have to check the docs in the hope that this behavior is documented.
Does this detail matters? I think it does. At least from a performance perspective.
Even if the cost of iterations is slow and CPU bounded, this is not zero, and it could go even worse in the scenario of chains of enumerations, that often turn debugging sessions a nightmare.
I prefer to not give the consumer of my library doubts so whenever I know my API returns few elements I always use arrays as return type, and only when the data to return is huge I use IEnumerable or IAsyncEnumerable.
Anyway, if you want to return enumerations please document your API to tell consumers if the result is a snapshot or not.

Threadsafe foreach enumeration of lists

I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).

Categories

Resources