Some C# collections have the count and some of them have the length property. Is there a thumbrule to find out which one has which and why the discrepency?
I'd say general Thumbrule would be the following:
Count is for collections with a
variable length, i.e. Lists (from
ICollection)
Length is for fixed length
collections, i.e. Arrays, or other
immutable objects, i.e. string.
UPDATE:
Just to elaborate Count comes through from ICollection and doesn't always indicate variability, for example (as per Greg Beech's comment) the ReadOnlyCollection<T> has the Count property but it is not variable, however it does implement ICollection.
Perhaps a more exact rule of thumb would be:
Count indicates that something
implements ICollection
Length indicates immutability.
If the type implements ICollection it will have the Count property. Length on the other hand is not standard and is defined as a property of the Array class so all fixed size arrays will have it as well.
As others have said Count comes from ICollection and Length is specifically defined on certain types, typically types that are immutable such as String and Array.
To me, Count implies mutability as the count of something can easily change. Length feels more immutable. The length of a given object usually doesn't change without drastic measure.
Also keep in mind there is the extension method Count() defined in LINQ, which provides a common interface to both of these properties. LINQ is smart enough to return Count() as efficiently as it can (ie if the Count or Length properties exist it will invoke them), so it's a decent alternative.
Related
So with the introduction of lists we can easily do, what we used to do with IEnumerable classes. So now IEnumerable classes mostly used only to understand how internally the lists work. Am i right?
No. While a list is enumerable, the reverse is not quite true.
IEnumerable as an interface states that you can iterate over a series of elements, whereas IList states that the element's order is preserved and you can randomly access them [in an efficient fashion – in .NET]. Although both of these revolve around a collection-ish data structure, the promises the interfaces make are different. That said, many types offer both and therefore implement both interfaces (List<T>, T[], ...).
Think of a data reader or a function computing a series of values (Fibonacci, prime numbers, ...), sets or trees for instance. None of there are lists, yet you can iterate over them using a method that takes an IEnumerable rather than and IList as a parameter.
Despite not strictly matching the tags, it may be worth mentioning that the interface semantics are slightly different in Java. The List interface does not specifiy random access. The Iterable interface though, which is the counterpart of IEnumerable, is similar.
I would suggest that interfaces like IEnumerable, ICollection and IList (in generic and non-generic forms, though ICollection and ICollection<T> have little relation to each other) are best thought of ways by which objects can allow themselves to be used by code which is designed to do various things with sequences of items (which may be countable or addressable by index). Code which simply wants to create a counted ordered sequence of items which is addressable by index can simply use List<T>, and can use that type whether or not it needs something which is counted and addressable by index, but an object which wants to make itself usable by code expecting a sequence of items can implement one of the above interfaces directly, rather than having to copy its contents to a new List<T> every time anybody wants enumerate them.
Note that one major advantage of IEnumerable<T> is that types which implement it do not have to hold the "entire contents" in memory, and can instead generate elements "on the fly". If one has a method that does something with all the items in an IEnumerable<T>, and one wants to perform that action on every item in a 1,000,000-item list where Fnord is greater than 23 (which is true of 95% of the items), having the method accept IEnumerable<T> will avoid any need to create a 950,000-item list containing all the items where Fnord is greater than 23.
Suppose I want to be able to compare 2 lists of ints and treat one particular value as a wild card.
e.g.
If -1 is a wild card, then
{1,2,3,4} == {1,2,-1,4} //returns true
And I'm writing a class to wrap all this logic, so it implements IEquatable and has the relevant logic in public override bool Equals()
But I have always thought that you more-or-less had to implement GetHashCode if you were overriding .Equals(). Granted it's not enforced by the compiler, but I've been under the impression that if you don't then you're doing it wrong.
Except I don't see how I can implement .GetHashCode() without either breaking its contract (objects that are Equal have different hashes), or just having the implementation be return 1.
Thoughts?
This implementation of Equals is already invalid, as it is not transitive. You should probably leave Equals with the default implementation, and write a new method like WildcardEquals (as suggested in the other answers here).
In general, whenever you have changed Equals, you must implement GetHashCode if you want to be able to store the objects in a hashtable (e.g. a Dictionary<TKey, TValue>) and have it work correctly. If you know for certain that the objects will never end up in a hashtable, then it is in theory optional (but it would be safer and clearer in that case to override it to throw a "NotSupportedException" or always return 0).
The general contract is to always implement GetHashCode if you override Equals, as you can't always be sure in advance that later users won't put your objects in hashtables.
In this case, I would create a new or extension method, WildcardEquals(other), instead of using the operators.
I wouldn't recommend hiding this kind of complexity.
From a logical point of view, we break the concept of equality. It is not transitive any longer. So in case of wildcards, A==B and B==C does not mean that A==C.
From a technical pount of view, returning the same value from GetHashCode() is not somenting unforgivable.
The only possible idea I see is to exploit at least the length, e.g.:
public override int GetHashCode()
{
return this.Length.GetHashCode()
}
It's recommended, but not mandatory at all. If you don't need that custom implementation of GetHashCode, just don't do it.
GetHashCode is generally only important if you're going to be storing elements of your class in some kind of collection, such as a set. If that's the case here then I don't think you're going to be able to achieve consistent semantics since as #AlexD points out equality is no longer transitive.
For example, (using string globs rather than integer lists) if you add the strings "A", "B", and "*" to a set, your set will end up with either one or two elements depending on the order you add them in.
If that's not what you want then I'd recommend putting the wildcard matching into a new method (e.g. EquivalentTo()) rather than overloading equality.
Having GetHashCode() always return a constant is the only 'legal' way of fulfilling the equals/hashcode constraint.
It'll potentially be inefficient if you put it in a hashmap, or similar, but that might be fine (non-equal hashcodes imply non-equality, but equal hashcodes imply nothing).
I think this is the only possible valid option there. Hashcodes essentially exist as keys to look things up by quickly, and since your wildcard must match every item, its key for lookup must equal every item's key, so they must all be the same.
As others have noted though, this isn't what equals is normally for, and breaks assumptions that many other things may use for equals (such as transitivity - EDIT: turns out this is actually contractual requirement, so no-go), so it's definitely worth at least considering comparing these manually, or with an explicitly separate equality comparer.
Since you've changed what "equals" means (adding in wildcards changes things dramatically) then you're already outside the scope of the normal use of Equals and GetHashCode. It's just a recommendation and in your case it seems like it doesn't fit. So don't worry about it.
That said, make sure you're not using your class in places that might use GetHashCode. That can get you in a load of trouble and be hard to debug if you're not watching for it.
It is generally expected that Equals(Object) and IEquatable<T>.Equals(T) should implement equivalence relations, such that if X is observed to be equal to Y, and Y is observed to be equal to Z, and none of the items have been modified, X may be assumed to be equal to Z; additionally, if X is equal to Y and Y does not equal Z, then X may be assumed not to equal Z either. Wild-card and fuzzy comparison methods are do not implement equivalence relations, and thus Equals should generally not be implemented with such semantics.
Many collections will kinda-sorta work with objects that implement Equals in a way that doesn't implement an equivalence relation, provided that any two objects that might compare equal to each other always return the same hash code. Doing this will often require that many things that would compare unequal to return the same hash code, though depending upon what types of wildcard are supported it may be possible to separate items to some degree.
For example, if the only wildcard which a particular string supports represents "arbitrary string of one or more digits", one could hash the string by converting all sequences of consecutive digits and/or string-of-digit wildcard characters into a single "string of digits" wildcard character. If # represents any digit, then the strings abc123, abc#, abc456, and abc#93#22#7 would all be hashed to the same value as abc#, but abc#b, abc123b, etc. could hash to a different value. Depending upon the distribution of strings, such distinctions may or may not yield better performance than returning a constant value.
Note that even if one implements GetHashCode in such a fashion that equal objects yield equal hashes, some collections may still get behave oddly if the equality method doesn't implement an equivalence relation. For example, if a collection foo contains items with keys "abc1" and "abc2", attempts to access foo["abc#"] might arbitrarily return the first item or the second. Attempts to delete the key "abc#" may arbitrarily remove one or both items, or may fail after deleting one item (its expected post-condition wouldn't be met, since abc# would be in the collection even after deletion).
Rather than trying to jinx Equals to compare hash-code equality, an alternative approach is to have a dictionary which holds for each possible wildcard string that would match at least one main-collection string a list of the strings it might possibly match. Thus, if there are many strings which would match abc#, they could all have different hash codes; if a user enters "abc#" as a search request, the system would look up "abc#" in the wild-card dictionary and receive a list of all strings matching that pattern, which could then be looked up individually in the main dictionary.
I currently have a class that holds 3 dictionaries, each of which contains Lists of the same type within each dictionary, but different types across the dictionaries, such as:
Dictionary1<string, List<int>> ...
Dictionary2<string, List<double>>...
Dictionary3<string, List<DateTime>>...
Is there a way to use a different collection that can hold all the Lists so that I can iterate through the collection of Lists? Being able to iterate is the only requirement of such collection, no sorting, no other operations will be needed.
I want to be able to access the List directly through the string or other identifier and access the List's members. Type safety is not a requirement but in exchange I do not want to have to cast anything, speed is the absolutely top priority here.
So, when calculations are performed on the list's members knowledge of the exact type is assumed, such as "double lastValue = MasterCollection["List1"].Last();", whereas it is assumed that List1 is a List of type double.
Can this be accomplished? Sorry that I may use sometimes incorrect or incomplete terminology I am not a trained programmer or developer.
Thanks,
Matt
To do that you would have to use a non-generic API, such as IList (not IList<T>) - i.e. Dictionary<string, IList>. Or since you just need to iterate, maybe just IEnumerable (not IEnumerable<T>). However! That will mean that you are talking non-generic, so some sacrifices may be necessary (boxing of value types during retrieval, etc).
With an IList/IEnumerable appraoch, to tweak your example:
double lastValue = MasterCollection["List1"].Cast<double>().Last();
You could, of course, write some custom extension methods on IDictionary<string,IList>, allowing something more like:
double lastValue = MasterCollection.Get<double>("List1").Last();
I'm not sure it is worth it, though.
No, what you are trying to do is not possible; namely, the requirement for strong-typing on all of the lists without casting is what's preventing the rest.
If your only requirement is to iterate through each of the items in the list, then you could create your dictionary as a Dictionary<string, IEnumerable> (note the non-generic interface). IEnumerable<T> derives from IEnumerable which would allow you to iterate through each item in the list.
The problem with this is that you would have to perform a cast at some point either to the IEnumerable<T> (assuming you know you are working with it) or use the Cast<T> extension method on the Enumerable class (the latter being worse, as you might incur boxing/unboxing, unless it does type-sniffing, in which case, you wouldn't have a performance penalty).
I would say that you shouldn't store the items in a single list; your example usage shows that you know the type ahead of time (you are assigning to a double) so you are aware at that point in time of the specific typed list.
It's not worth losing type-safety over.
Here's my strategy for choosing which C# collection type to use:
if number of items in collection is fixed, then use an array, e.g.:
string[] directions = new string[] { "north", "south", "east", "west" };
otherwise always use List<T>
unless of course you need a more specialized collection, e.g. Stack<T>, Queue<T>, or Dictionary<TKey, TValue>
but never use ArrayList anymore
Based on your experience, what is missing from this strategy?
Your rules work fine.
In addition:
Always perfer generic (or specialized) collections over ungeneric (object-based) ones.
Use HashSet<T> if you want to check for mere existence instead of key-value-mappings (which is represented through Dictionary).
In case of dictionaries, consider the use of ordered maps (SortedList<...>, SortedDictionary<...>) if ordering seems important.
Use linked lists if you have remove/insert operations in the middle.
and of course the most important one:
Never export concrete collection types: Always use the most general interface, which is - in the most cases - IList<T> or IEnumerable<T>.
I'd say that in the majority of cases, even if I knew the number of items in a collection, I'd use List, simply for the number of utility functions it provides and compatibility with LINQ.
Hashmaps are an important use case, for when you would want faster access to items within the collection.
You can also use Collection<T> if you plan to override add/remove/clear methods in an inherited class because there are virtual methods.
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)