Why would one use Stack<T> instead of List<T>? - c#

List<T> from System.Collections.Generic does everything Stack<T> does, and more -- they're based on the same underlying data structure. Under what conditions is it correct to choose Stack<T> instead?

You would use stack if you had a need for a Last In First Out collection of items. A list will allow you to access it's items at any index. There are a lot of other differences but I would say this is the most fundamental.
Update after your comment:
I would say that using Stack<T> makes a statement about how you want this code to be used. It's always good to plan for the future, but if you have a need for Stack<T> right now, and no compelling reason to use List<T> then I would go with Stack<T>

why I would artificially limit myself to using Stack in new code
There's your answer - you should use Stack when you have a need to enforce a contractual expectation that the data structure being used can only be operated on as a stack. Of course, the times you really want to do that are limited, but it's an important tool when appropriate.
For example, supposed the data being worked with doesn't make any sense unless the stack order is enforced. In those cases, you'd be opening yourself up to trouble if you made the data available as a list. By using a Stack (or a Queue, or any other order-sensitive structure) you can specify in your code exactly how the data is supposed to be used.

Well, you would want to use Stack if you were logically trying to represent a stack. It will convey the intention of the programmer throughout the code if you use a stack, and it will prevent in-advertant mis-use of the data structure (unintentionally adding/removing/reading somewhere other than one end).
It's certainly possible that, rather than a concrete implementation, Stack could just be an interface. You could then have something like List implement that interface. The problem there is mostly a matter of convenience. If someone needs a stack they need to pick some specific implementation and remember ("Oh yeah, List is the preferred stack implementation") rather than just newing up the concrete type.

It's all about concept. A List is a List, and a Stack is a Stack, and they do two very different things. Their only commonality is their generic nature and their variable length.
A List is a variable-length collection of items in which any element can be accessed and overwritten by index, and to which items can be added and from which items can be removed at any such index.
A Stack is a variable-length collection of items supporting a LIFO access model; only the top element of the Stack can be accessed, and elements can be added to and removed from only that "endpoint" of the collection. The item 3 elements from the "top" can only be accessed by "popping" the two elements above it to expose it.
Use the correct tool for the job; use a List when you need "random" access to any element in the collection. Use a Stack when you want to enforce the more limited "top-only" access to elements in the array. Use a Queue when you want to enforce a FIFO "pipeline"; items go in one end, out the other.

Besides being conceptually different, as the other the answers already point out, there are also different methods in the Stack that make your code cleaner (and your life easier) than the correspondent code when using a List.
For example, a simple object pool snippet when using a Stack:
if (!pool.TryPop(out var obj))
{
obj = new Foo();
}
And the same written as a List while keeping the operation O(1):
Foo obj;
int count = pool.Count;
if (count > 0)
{
obj = pool[--count];
pool.RemoveAt(count);
}
else
{
obj = new Foo();
}

System.Collections.Generic.Stack<T> is a LIFO (Last-In, First-Out) data structure aka a stack.
Despite its name, SCG.List<T> is not the abstract data type known as a [linked] list: it is, in fact, a variable-length array.
Two very different creatures.

Related

Push Item to the end of an array

No, I can't use generic Collections. What I am trying to do is pretty simple actually. In php I would do something like this
$foo = [];
$foo[] = 1;
What I have in C# is this
var foo = new int [10];
// yeah that's pretty much it
Now I can do something like foo[foo.length - 1] = 1 but that obviously wont work. Another option is foo[foo.Count(x => x.HasValue)] = 1 along with a nullable int during declaration. But there has to be a simpler way around this trivial task.
This is homework and I don't want to explain to my teacher (and possibly the entire class) what foo[foo.Count(x => x.HasValue)] = 1 is and why it works etc.
The simplest way is to create a new class that holds the index of the inserted item:
public class PushPopIntArray
{
private int[] _vals = new int[10];
private int _nextIndex = 0;
public void Push(int val)
{
if (_nextIndex >= _vals.Length)
throw new InvalidOperationException("No more values left to push");
_vals[_nextIndex] = val;
_nextIndex++;
}
public int Pop()
{
if (_nextIndex <= 0)
throw new InvalidOperationException("No more values left to pop");
_nextIndex--;
return _vals[_nextIndex];
}
}
You could add overloads to get the entire array, or to index directly into it if you wanted. You could also add overloads or constructors to create different sized arrays, etc.
In C#, arrays cannot be resized dynamically. You can use Array.Resize (but this will probably be bad for performance) or substitute for ArrayList type instead.
But there has to be a simpler way around this trivial task.
Nope. Not all languages do everything as easy as each other, this is why Collections were invented. C# <> python <> php <> java. Pick whichever suits you better, but equivalent effort isn't always the case when moving from one language to another.
foo[foo.Length] won't work because foo.Length index is outside the array.
Last item is at index foo.Length - 1
After that an array is a fixed size structure if you expect it to work the same as in php you're just plainly wrong
Originally I wrote this as a comment, but I think it contains enough important points to warrant writing it as an answer.
You seem to be under the impression that C# is an awkward language because you stubbornly insist on using an array while having the requirement that you should "push items onto the end", as evidenced by this comment:
Isn't pushing items into the array kind of the entire purpose of the data structure?
To answer that: no, the purpose of the array data structure is to have a contiguous block of pre-allocated memory to mimic the original array structure in C(++) that you can easily index and perform pointer arithmetic on.
If you want a data structure that supports certain operations, such as pushing elements onto the end, consider a System.Collections.Generic.List<T>, or, if you insist on avoiding generics, a System.Collections.List. There are specializations that specify the underlying storage structure (such as ArrayList) but in general the whole point of the C# library is that you don't want to concern yourself with such details: the List<T> class has certain guarantees on its operations (e.g. insertion is O(n), retrieval is O(1) -- just like an array) and whether there is an array or some linked list that actually holds the data is irrelevant and is in fact dynamically decided based on the size and use case of the list at runtime.
Don't try to compare PHP and C# by comparing PHP arrays with C# arrays - they have different programming paradigms and the way to solve a problem in one does not necessarily carry over to the other.
To answer the question as written, I see two options then:
Use arrays the awkward way. Either create an array of Nullable<int>s and accept some boxing / unboxing and unpleasant LINQ statements for insertion; or keep an additional counter (preferably wrapped up in a class together with the array) to keep track of the last assigned element.
Use a proper data structure with appropriate guarantees on the operations that matter, such as List<T> which is effectively the (much better, optimised) built-in version of the second option above.
I understand that the latter option is not feasible for you because of the constraints imposed by your teacher, but then do not be surprised that things are harder than the canonical way in another language, if you are not allowed to use the canonical way in this language.
Afterthought:
A hybrid alternative that just came to mind, is using a List for storage and then just calling .ToArray on it. In your insert method, just Add to the list and return the new array.

Should I return a collection when the reference to the collection is not changed?

I got a method which accepts a collection as below
public IList<CountryDto> ApplyDefaults(IList<CountryDto> dtos)
{
//Iterates the collection
//Validates the items in collection
//If items are invalid
//Removes items e.g dtos.Remove(currentCountryDto)
return dtos;//Do I need to do this?
}
My question is since, the reference to the collection is not changed, should I return the collection again from the method?
For: By returning the collection back, I make it explicit in the signature and user is aware that the items in the collection could be different from the original source. Sort of it avoid ambiguity.
Against: Since the validation doesnt change the reference of the collection, it doesn't make sense technically to return it.
What is the best approach in this case?
Note: I am not sure if this question is opinion based. I think probably I missing something here on design side.
In every programming language consistency of your own code / library with the approach of the core libraries is of high value. Hence, inspecting how Collections.sort() or Collection.swap() and Collections.shuffle() are defined, I would suggest to not return the input parameter, if you intend to modify it. In addition, your method should be named in such a way, that it is obvious the input parameter gets modified. Otherwise your method will be considered to have side-effects.
Returning a value most often suggests that it is a new instance which reflects the work, performed by the method or is used for method-chaining in case of builders.
Given your comments/requirements:
Does not need to report if defaults are applied.
ApplyDefaults is complicated and invoking other services and not intended to produce a fluent API
ApplyDefaults is a "black box"; validation logic is injected so the calling code doesn't know/care about the validation
I think based on these, this method definitely should not return the reference to the incoming list, even if no validation is applied. Firstly, unless the API is clearly built around method chaining (which you indicated you do not want), returning a List<T> type usually indicates a new List is being created. Secondly, if a new list is not created, users may find themselves modifying the list in ways they didn't expect.
Consider:
IList<CountryDto> originalCountries = Service.GetCountries();
IList<CountryDto> validatedCountries = ApplyDefaults(originalCountries);
validatedCountries.Add(mySpecialCountry);
OutputOriginalCountries(originalCountries);
OutputValidatedCountries(validatedCountries);
This code isn't very special, and a fairly common pattern. If ApplyDefaults returned a reference to the same originalCountries collection, then mySpecialCountry would also be added to originalCountries. This would violate the Principle of Least Astonishment.
This would be exacerbated if this behaviour changed depending on whether or not items were validated/filtered. Since the validation logic is a black-box of behaviour that the caller doesn't know or care about, the API consumer could not depend on whether or not it returned the same reference. They would either have to do their own reference check (e.g., if (myValidatedCountries == myInputCountries)), or simply make a copy every time. Regardless, this becomes another weird behaviour that the programmer has to juggle when working with the API.
I think that the method should either:
A) always return a copied list with the items filtered out (public IList<CountryDto> ApplyDefaults(IEnumerable<CountryDto> dtos))
B) modify the incoming list in-place (public void ApplyDefaults(IList<CountryDto> dtos))
For option A, depending on the size of your list, this incurs the possible unnecessary work of creating a copied list every time even if no filtering is performed. However, the validation/filtering logic might be simpler. You might be able to use LINQ queries to apply the filtering nicely. Additionally, removing items from a list is generally costly as it has to rebuild the internal array. So it might actually be faster to build a new list. You may even simplify the signature here to be IEnumerable<CountryDto>; this allows for wider usage and is extremely obvious that you're creating a new collection.
For option B, if no validation is required, then no work is done and the method is essentially "free" (no array rebuilding, no copying, no reference changes). But if there is significant validation, the removal aspect may be costly. Since you're not method chaining, this version should have a void return type as it's much more obvious to the developer that this is modifying the list in-place. This follows other commonly known methods like List<T>.Sort. Furthermore, if a user wants to have a separate originalCountries and validatedCountries they can always make a copy:
var validatedCountries = originalCountries.ToList();
ApplyDefaults(validatedCountries);
Ultimately, which one you choose might depend on performance. If validation/removal is cheap and rare, then modifying the list in-place might be best. If you're expecting a lot of changes to the list, it might simply be faster to produce a new copy every time.
Regardless, I would suggest you name the method with a little more clarity as well. For example:
public IList<CountryDto> GetValidCountries(IEnumerable<CountryDto> dtos)
public void RemoveInvalidCountries(IList<CountryDto> dtos)
Of course, the naming might be different depending on your actual code context (I suspect ApplyDefaults is a common/inherited method name and not specific to CountryDto)
I'd rather return boolean (or enum in an elaborated case: collection preserved intact,
changed, can't be validated etc.)
// true if the collection is changed, false otherwise
public Boolean ApplyDefaults(IList<CountryDto> dtos) {
Boolean result = false;
//Iterates the collection
//Validates the items in collection
//If items are invalid:
// Removes items e.g dtos.Remove(currentCountryDto)
// result = true;
...
return result;
}
...
if (ApplyDefaults(myData)) {
// Collection is changed, do some extra stuff
}
First of all: you cannot change the reference of the collection you send by parameter, because by default you're getting copy of it. You'd need to use a ref keyword in order to be able to change it.
Secondly: if your method has a return type, than it has to return an object. Your method is not called GetNewCollectionWithAppliedDefaults, but ApplyDefaults which implies that the collection will be modified. You should either return boolean true/false to inform user changes were done or always return parameter's collecion (to allow nested methods calling).
Also, why would you think it doesn't make sense to return a collection? I'd say there's no argument against it. Turn the question around: "why wouldn't I return the collection and could it harm my code"?
Technically, I would say there is not much difference between the two.
However, and as you pointed out, a common used convention is that a function should only return an object it creates. Basically, that would mean that a function that returns an object is generating one while a function which doesn't return anything is modifying the object passed as a parameter.
Again, this is only a convention and it is not widely used within the C# community, but in the python community for example, it is.
Some people, returns a Boolean (or an error code) instead as an indicator of an error (like the old dos command line). I don't like this approach and prefer by far raising exceptions that I can handle later on.
Finally, the best approach in my regard, is to return a value that indicates if a change was done by the function and eventually a value indicating how much of a change was done. It can be a Boolean or it can be the number of inserted/removed elements...
In any case, try to be consistent with the approach you chose, if not in all your code, at least within a single project. Sometimes, you will have no other choice but to abide with the convention used by your teammates.
(My answer is based on the Java viewpoint; C++ and C# programmers might have a different take.) I think it's best to return the collection. The fact that the collection you're returning is the same collection that was given is just an implementation detail, and in future versions of the code, you might want to change that. Document that the collection returned might not be the same one passed in.
If, on the other hand, you want to lock in the design that this method modifies a collection in place, document it that way and don't return the collection. I prefer not to do it this way, but I can see advantages in some contexts.
In your case I would leave void since ApplyDefaults clearly states what its doing.
Also, it might be a good idea to ApplyDefaults in the collection itself. Subclass IList or List or whatever and then you'd call like this:
myCollection.ApplyDefaults();
Which is just obvious.

What is the difference between LinkedList and ArrayList, and when to use which one?

What is the difference between LinkedList and ArrayList? How do I know when to use which one?
The difference is the internal data structure used to store the objects.
An ArrayList will use a system array (like Object[]) and resize it when needed. On the other hand, a LinkedList will use an object that contains the data and a pointer to the next and previous objects in the list.
Different operations will have different algorithmic complexity due to this difference in the internal representation.
Don't use either. Use System.Collections.Generic.List<T>.
That really is my recommendation. Probably independently of what your application is, but here's a little more color just in case you're doing something that needs a finely tuned choice here.
ArrayList and LinkedList are different implementations of the storage mechanism for a List. ArrayList uses an array that it must resize if your collection outgrows it current storage size. LinkedList on the other hand uses the linked list data structure from CS 201. LinkedList is better for some head- or tail-insert heavy workloads, but ArrayList is better for random access workloads.
ArrayList has a good replacement which is List<T>.
In general, List<T> is a wrapper for array - it allows indexing and accessing items in O(1), but, every time you exceed the capacity an O(n) must be paid.
LinkedList<T> won't let you access items using index but you can count that insert will always cost O(1). In addition, you can insert items in to the beginning of the list and between existing items in O(1).
I think that in most cases List<T> is the default choice. Many of the common scenarios don't require special order and have no strict complexity constraints, therefore List<T> is preferred due to its usage simplicity.
The main difference between ArrayList and List<T>, LinkedList<T>, and other similar Generics is that ArrayList holds Objects, while the others hold a type that you specify (ie. List<Point> holds only Points).
Because of this, you need to cast any object you take out of an ArrayList to its actual type. This can take a lot of screen space if you have long class names.
In general it's much better to use List<T> and other typed Generics unless you really need to have a list with multiple different types of objects in it.
The difference lies in the semantics of how the List interface* is implemented:
http://en.wikipedia.org/wiki/Arraylist and http://en.wikipedia.org/wiki/LinkedList
*Meaning the basic list operations
As #sblom has stated, use the generic counterparts of LinkedList and ArrayList. There's really no reason not to, and plenty of reasons to do so.
The List<T> implementation is effectively wrapping an Array. Should the user attempt to insert elements beyond the bounds of the backing array, it will be copied to a larger array (at considerable expense, buit transparently to users of the List<T>)
A LinkedList<T> has a completely different implementation in which data is held in LinkedListNode<T> instances, which carry reference to two other LinkedListNode<T> instances (or only one in the case of the head or tail of the list). No external reference to mid-list items is created. This means that iterating the list is fast, but random-access is slow, because one must iterate the nodes from one end or the other. The best reason to use a LinkedList is to allow for fast inserts, that involve simply changing the references held by the nodes, rather than rewriting the entire list to insert an item (as is the case with List<T>)
They have different performance on "inserts" (adding new elements) and lookups. For inserts ArrayLists keeps an array internally (initially 16 items long) and when you reach the max capacity it doubles the size of the array. An LinkedList starts empty and add an item (node) when needed.
I think also that with ArrayList you are able to index the items, while with the LinkedList you have to "visit" the item from the head (or the LinkedList does this automatically for you).

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?
If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.
I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.
If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.
I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.
One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.
You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.
You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Threadsafe foreach enumeration of lists

I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).

Categories

Resources