Does there exist an implementation for .NET of a list collection such that both insert and lookup are worst-case O(log(n)) operations? The default System.Collections.Generic.List 'Insert' method is an O(n) operation.
By a list collection, I mean an array-like expandable data structure. By 'lookup' I mean access by index.
I suspect this can be done with balanced trees, but would be non-trivial to implement.
I do not know of a .NET implementation, but a data structure that might work for you is an Indexible Skiplist. It has similar O(lg n) performance like a balanced binary tree but is conceptually more like a linked list.
http://en.wikipedia.org/wiki/Skip_list
I don't think it would be too hard to write one in C#.
The C5 TreeSet should give you a red/black implementation with those characteristics, including index access.
Don't know if there exist in the .net framework, but you can implement aa tree insert and search is both O(log n).
There is no possible solution for this if you need to access fields using their index. You can use a SortedList, but then you get O(n), or you can use SortedDictionary, but then you lose the array-like access (by index).
Related
var usedIds = list.Count > 20 ? new HashSet<int>() as ICollection<int> : new List<int>();
Assuming that List is more performant with 20 or less items and HashSet is more performant with greater items amount (from this post), is it efficient approach to use different collection types dynamicaly based on the predictable items count?
All of the actions for each of the collection types will be the same.
PS: Also i have found HybridCollection Class which seems to do the same thing automaticaly, but i've never used it so i have no info on its performance either.
EDIT: My collection is mostly used as the buffer with many inserts and gets.
In theory, it could be, depending on how many and what type of operations you are performing on the collections. In practice, it would be a pretty rare case where such micro-optimization would justify the added complexity.
Also consider what type of data you are working with. If you are using int as the collection item as the first line of your question suggests, then the threshold is going to be quite a bit less than 20 where List is no longer faster than HashSet for many operations.
In any case, if you are going to do that, I would create a new collection class to handle it, something along the lines of the HybridDictionary, and expose it to your user code with some generic interface like IDictionary.
And make sure you profile it to be sure that your use case actually benefits from it.
There may even be a better option than either of those collections, depending on what exactly it is you are doing. i.e. if you are doing a lot of "before or after" inserts and traversals, then LinkedList might work better for you.
Hashtables like Hashset<T> and Dictionary<K,T> are faster at searching and inserting items in any order.
Arrays T[] are best used if you always have a fixed size and a lot of indexing operations. Adding items to a array is slower than adding into a list due to the covariance of arrays in c#.
List<T> are best used for dynamic sized collections whith indexing operations.
I don't think it is a good idea to write something like the hybrid collection better use a collection dependent on your requirements. If you have a buffer with a lof of index based operations i would not suggest a Hashtable, as somebody already quoted a Hashtable by design uses more memory
HashSet is for faster access, but List is for insert. If you don't plan adding new items, use HashSet, otherwise List.
If you collection is very small then the performance is virtually always going to be a non-issue. If you know that n is always less than 20, O(n) is, by definition, O(1). Everything is fast for small n.
Use the data structure that most appropriate represents how you are conceptually treating the data, the type of operations that you need to perform, and the type of operations that should be most efficient.
is it efficient approach to use different collection types dynamicaly based on the predictable items count?
It can be depending on what you mean by "efficiency" (MS offers HybridDictionary class for that, though unfortunately it is non generic). But irrespective of that its mostly a bad choice. I will explain both.
From an efficiency standpoint:
Addition will be always faster in a List<T>, since a HashSet<T> will have to precompute hash code and store it. Even though removal and lookup will be faster with a HashSet<T> as size grows up, addition to the end is where List<T> wins. You will have to decide which is more important to you.
HashSet<T> will come up with a memory overhead compared to List<T>. See this for some illustration.
But however, from a usability standpoint it need not make sense. A HashSet<T> is a set, unlike a bag which List<T> is. They are very different, and their uses are very different. For:
HashSet<T> cannot have duplicates.
HashSet<T> will not care about any order.
So when you return a hybrid ICollection<T>, your requirement goes like this: "It doesn't matter whether duplicates can be added or not. Sometimes let it be added, sometimes not. Of course iteration order is not important anyway" - very rarely useful.
Good q, and +1.
HashSet is better, because it will probably use less space, and you will have faster access to elements.
I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?
How can I get the n-th element of a LinkedList instance? Is there a built-in way or I might need to introduce my own implementation? For example an extension method?
Thanks
The ElementAt extension method will do it:
// This is 0-based of course
var value = linkedList.ElementAt(n);
Don't forget this is an O(n) operation because LinkedList<T> doesn't provide any more efficient way of accessing an item by index. If you need to do this regularly, it suggests that you shouldn't be using a linked list to start with.
You can use the ElementAt() enumerable extension method. The reason LinkedList doesn't support random access natively is because it's a rather inefficient operation for the data structure. If you're going to be doing it often you should think about using a more appropriate data structure.
You can do it with LINQ as in list.ElementAt(n) or list.Skip(n - 1).First() , but if you find yourself making indexed access into a linked list you are probably doing something wrong (linked lists do not efficiently support this operation). Perhaps another data structure would be more appropriate?
I needed to get the second element of my list (to update a value on the 1st item based on the 2nd)
Assuming you're taking the necessary steps to ensure you do have two items you can simply do this :
list.First.Next.Value
HashSet
The C# HashSet data structure was introduced in the .NET Framework 3.5. A full list of the implemented members can be found at the HashSet MSDN page.
Where is it used?
Why would you want to use it?
A HashSet holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look here
HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference. Take a look here
There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.
The HashSet class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So the point is that HashSet is fast.
The only catch of HashSet is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that. Take a look here
A HashSet has an internal structure (hash), where items can be searched and identified quickly. The downside is that iterating through a HashSet (or getting an item by index) is rather slow.
So why would someone want be able to know if an entry already exists in a set?
One situation where a HashSet is useful is in getting distinct values from a list where duplicates may exist. Once an item is added to the HashSet it is quick to determine if the item exists (Contains operator).
Other advantages of the HashSet are the Set operations: IntersectWith, IsSubsetOf, IsSupersetOf, Overlaps, SymmetricExceptWith, UnionWith.
If you are familiar with the object constraint language then you will identify these set operations. You will also see that it is one step closer to an implementation of executable UML.
Simply said and without revealing the kitchen secrets:
a set in general, is a collection that contains no duplicate elements, and whose elements are in no particular order. So, A HashSet<T> is similar to a generic List<T>, but is optimized for fast lookups (via hashtables, as the name implies) at the cost of losing order.
From application perspective, if one needs only to avoid duplicates then HashSet is what you are looking for since it's Lookup, Insert and Remove complexities are O(1) - constant. What this means it does not matter how many elements HashSet has it will take same amount of time to check if there's such element or not, plus since you are inserting elements at O(1) too it makes it perfect for this sort of thing.
I'm looking for an implementation of a Red-Black Tree in C#, with the following features:
Search, Insert and Delete in O(log n).
Members type should be generic.
Support in Comparer(T), for sorting T by different fields in it.
Searching in the tree should be with the specific field, so it won't accept T, but it'll accept the field type sorting it.
Searching shouldn't be only exact value. Should support searching the lower/higher one.
Thank you.
You mostly just described SortedDictionary<T, U>, except for the next-lowest/next-highest value binary search, which you could implement on your own without much difficulty.
Are there specific reasons that SortedDictionary is insufficient for you?
Rip the TreeSet from C5 collection libs.
This is exactly the OrderedDictionary in PowerCollections. It's pretty much identical to SortedDictionary (red black tree with generics) with the addition of the ability to set a start key/end key and scan all values in that range.
SortedDicionary only allows exposes a GetEnumerator() function that starts at the beginning of the collection and only allows a MoveNext() call, so even if you use LINQ there is nothing magic happening: it starts at the beginning and runs your expression on every single node, in order, until it finds those matching your LINQ expression.
OrderedDictionary has a function that gets an enumerator at or before a particular key and that does the lookup in O(log n).
A word of caution though: the enumerator in the PowerCollections OrderedDictionary is implemented using "yield" and the memory usage and enumeration performance is at least O(n^2)... you can change the implementation yourself to make it implement a traditional enumerator and both of these problems go away. I'll submit that patch to Codeplex if I can ever find the time.