I need a list which enables sorted insertion while allowing duplicate sort values.
SortedList as well as SortedSet from the System.Collections.Generic namespace for example do not allow duplicate values.
The next thing is, that Remove should use a different value as the Sorting uses, which means, if I have a list of persons I want to remove them by their name but the list must be sorted always by age, so I always can find the oldest person with o(1) complexity.
I tried to achieve this by inserting in a normal List using BinarySearch. That works but the removal of values then is still o(n) which is too slow for me.
Related
I'm looking for the most efficient way to store a collection of objects sorted by Comparator using object attribute in programming language C#.
There are objects with same value for attribute, so duplicate keys occur in the collection.
Complexityclasses for inserting and removing elements to or from sorted datastructure should not be higher than O(log(N)) (noted in Big O notation) since attribute used for sorting will change very often and list has to be updated with every change to stay consistent.
Complexityclasses for getting all elements in datastructure as list in sorted order should not be higher than O(1).
Options might be C# templates SortedSets, SortedDictionaries or SortedLists. All of them fail when inserting or deleting from sorted datastructure if duplicate keys are present.
A workaround might be to use stacked SortedDictionaries as shown below, and aggregate objects with equal key as seperated collections, and sort them by another unique key (ID for example):
SortedDictionary<long, SortedDictionary<long, RankedObject>> sPresortedByRank =
new SortedDictionary<long, SortedDictionary<long, RankedObject>>(new ByRankAscSortedDictionary());
long rank = 52
sPresortedByRank[rank] = new SortedDictionary<long, RankedObject>(new ByIdAscSortedDictionary);
Inserting and removing elements from datastructure will work in O(log(N)), what is good. Getting all elements from datastructe as list requires a expensive Queryable.SelectMany, which increases complexity for this operation to O(N^2), what is not acceptable.
Current best attemp is to use a primitive List and insert and delete using BinarySearch to identify indices for inserations and deletions. For insert this gives me worst case complexity O(log(N)). For delete average case complexity O(log(n)) since it duplicate keys will be rare, but O(N) in worst case (all objects with same key). Getting all elements of sorted list will be O(1).
Can someone imagine a better way of managing object collection in sorted datastructure that fits my needs, or is current best attemp the best one in general.
Help and well founded opinions are appreciated. Cookies for good answers of course.
I think the simplest solution that meets your requirements it to add tie breaking to your comparator to ensure that there will be no duplicates, and a defined total ordering among all objects. Then you can just use SortedSet.
If you can have an ID in every object, for example, then instead of sorting by "rank", you can sort by (rank,ID) to make the comparator a total ordering.
To find all the elements with a specific rank, you would then use SortedSet.GetViewBeteen() with the range of (rank,min_id) and (rank,max_id).
#Iliar Turdushevs very useful hint:
Do you consider using C5 library? It contains TreeBag collection that satisfies all you requirements. By the way, for primitive List the complexity of inserting and removing elements is not O(log(N)). O(log(N)) is only complexity of searching the index where the element must be inserted/deleted. But insertion/deletion itself uses Array.Copy to shift elements to insert/delete the element. Therefore the complexity would be O(M), where M <= N.
Dictionary<Tuple<int, int>, link_data> dic_links
I have the above code. I use tuple as the dictionary key.
I want to find value using only one of the two values in tuple.
Is there any way to find it using only index instead of searching the entire dictionary in foreach?
cout << dic_links[new Tuple<int, int>(57,**).data;
No, it is not possible to use only partial key to search in a dictionary with O(1) performance.
Your options are either search through all key or have separate dictionary to map each part of the key to object (make sure to keep them in sync).
If you only need to search by full key or one component and O(log n) is reasonable you can use sorted list instead (you will not be able to search by second component with single array).
For more ideas you can search for "range queries on dictionaries" where one would like to find "all items with key 10 to 100" which is the same problem.
No. The Dictionary is designed for efficient search using strict equality of the key. If you don't know exactly the key, then you must enumerate all elements one-by-one.
In your case you'll probably have duplicate values on each individual property of the tuple, so you'll not be able to use a simple Dictionary<int, link_data> for indexing by property. You could use either a Lookup<int, link_data> if your data are static, or a Dictionary<int, List<link_data>> if you need to add/remove elements after the creation of the index.
You can convert the dictionary to a nested dictionary.
Dictionary<int, Dictionary<int, link_data>>dic_links;
I have a List of an Record(structure) like this :
struct Rec
{
int WordID;
int NextID;
int PrevID;
}
List<Rec>= New List<Rec>(){...};
I need a way to find a value of "Rec" type in List without search all of records like Binary search. I want it's time complexity be less than O(n)
The best way to search for an item in a list is of course not having a list but having an hashtable.
If you have a dictionary instead of a list (or a dictionary AND a list) you can perform search for exact values in averaged\amortized O(1).
You can also use a binary search but only if the list is sorted, there is the method List<T>.BinarySearch and the search with be O(log n).
Sorting a list with n items is O(n log n).
Inserting n items in an hashtable instead is averaged O(n), inserting an item is averaged O(1).
This means that also creating the hashtable (or keeping the hashtable synchronized with the list) will be faster than sorting a list.
Consider however that hashtable consumes more memory because they have to keep internally a bucket array.
You can use binary search providing your list is sorted.
I have a Dictionary containing 10 keys, each with a list containing up to 30,000 values. The values contain a DateTime property.
I frequently need to extract a small subset of one of the keys, like a date range of 30 - 60 seconds.
Doing this is easy, but getting it to run fast is not so. What would be the most efficient way to query this in-memory data?
Thanks a lot.
Sort lists by date at the first, then find your required items by binary search (i.e k item) and return them, finding the searched item is O(log(n)) because you need find first and last index. returning them is O(K) in all It's O(K+log(n))
IEnumerable<item> GetItems(int startIndex, int endIndex, List<item> input)
{
for (int i=startIndex;i<endIndex;i++)
yield return input[i];
}
1) Keep the dictionary, but use SortedList instead of a list for value of dictionaries, sorted by DateTime property
2) Implement a binary search to find the upper and lower edges in your range in the sorted list which gives you indexes.
3) Just select values in the range using Sortedlist.Values.Skip(lowerIndex).Take(upperIndex - lowerIndex)
In reply to Aliostad: I don't think bsearch will not work if the list of the collection is a linked list. It still takes O(n)
the fastest way will be to organize the data so it is indexed by the thing you want to search on. Currently you have it indexed by key, but you want to search by date. I think you would be best indexing it by date, if that is what you want to be able to search on.
I would keep 2 dictionaries, one indexed as you do now and one where the items are indexed by date. i would decide on a time frame (say 1 minute) and add each object to a list based on the minute it happens in and then add each list to the dictionary under the key of that minute. then when you want the data for a particular time frame, generate the relevant minute(s) and get the list(s) from the dictionary. This relies on you being able to know the key in the other dictionary from the objects though.
I have sorted collection (List) and I need to keep it sorted at all times.
I am currently using List.BinarySearch on my collection and then insert element in right place. I have also tried sorting list after every insertion but the performance in unacceptable.
Is there a solution that will give better performance? Maybe I should use other collection.
(I am aware of SortedList but it is restricted to unique keys)
PowerCollections has an OrderedBag type which may be good for what you need. From the docs
Inserting, deleting, and looking up an
an element all are done in log(N) + M
time, where N is the number of keys in
the tree, and M is the current number
of copies of the element being
handled.
However, for the .NET 3.5 built in types, using List.BinarySearch and inserting each item into the correct place is a good start - but that uses an Array internally so your performance will drop due to all the copying you're doing when you insert.
If you can group your inserts into batches that will improve things, but unless you can get down to only a single sort operation after all your inserting you're probably better off using OrderedBag from PowerCollections if you can.
If you're using .Net 4, you can use a SortedSet<T>
http://msdn.microsoft.com/en-us/library/dd412070.aspx
For .Net 3.5 and lower, see if a SortedList<TKey,TValue> works for you.
http://msdn.microsoft.com/en-us/library/ms132319.aspx
Try to aggregate insertions into batches, and sort only at the end of each batch.