I'm looking for the fastest way to lookup if List, Set, Dictionary contains a specific Keyword (string). I don't need to store any data inside I just want to know if my Keyword is in the List.
I thought about some possibilities like:
Dictionary<string, bool> myDictionary = new Dictionary<string, bool>();
if (myDictionary.ContainsKey(valueToSearch))
{
// do something
}
but I don't need a value.
string[] myArray = {"key1", "key2", "key3"}
if (Array.IndexOf(myArray, valueToSearch) != -1)
{
// do something
}
Then I found:
List<string> list = new List<string>();
if (list.Contains(valueToSearch))
{
// do something
}
The lookup will happen very often and has to be very fast.
Any idea what's the fastest way to check if a value equals one of a given list of keys?
Of the standard collection types, Dictionary will be the fastest, since I don't think you have HashSet<T> in the compact framework. The other two do a sequential search.
In general, a Dictionary lookup is the usual solution to a problem like this, as long as your keys are good hash values that get a somewhat even distribution in the dictionary's lookup table.
However, there may be certain cases where a list lookup appears to run faster, depending on how the data is sorted and what exactly you are looking up.
The best way to tell is to run a profile of each case, and see which performs better.
I agree with Andy. You could also look at SortedList It's essentially a Dictionary that's sorted by its keys. Should make searching quicker if it's already sorted...
Related
I have code where I return a list of IWebElements and their corresponding names? My understanding is that a tuple with two items is basically the same thing but the Dictionary uses hash mapping to relate the two values. What is the advantage of using a Two Item Tuple over a Dictionary or vice versa?
public Dictionary<IWebElement, string> SelectAllOptions(IWebDriver driver, ref DataObject masterData)
{
//Get the ID of the dropdown menu
DatabaseRetrieval.GetObjectRepository(ref masterData);
var strDropMenuId = masterData.DictObjectRepository["ID"];
//Find the dropdown menu and pull all options into a list
try
{
var dropMenu = new SelectElement(driver.FindElement(By.Id(strDropMenuId)));
// TODO want to know how we want this list to return.
var options = dropMenu.Options as List<IWebElement>;
if (options != null)
{
var values = options.ToDictionary(option => option, option => option.Text);
return values;
}
}
As a general rule, you do not want to "pay" * for possibilities that your program does not need. For example, if your program is interested in retrieving and processing a sequence of pairs (also known as "two-member tuples") but it does not need to perform lookups from the first member of a tuple to the second, then providing a collection of pairs is more efficient:
IEnumerable<Tuple<IWebElement, string>> SelectAllOptions(...)
This approach takes less memory, because you do not allocate space for hash buckets
This approach takes less CPU, because there is no hash key computation or collision resolution costs
This approach does not suggest to a reader that the data is intended for lookups.
Of course if the data structure that you return is intended for lookups, then you should either return a dictionary, or construct one on the client side to transfer some of the CPU load from the server to the client.
* With memory, CPU cycles, decreased readability, etc.
A Tuple<T1, T2> represents a pair of values. That pair don't necessarily have to mean "These two items are related". When using a KeyValuePair<TKey, TValue>, you would expect that given a key, you would get a value and the two would have some sort of connection between one another.
Tuples implement IComparable and IStructuralEquatable, which makes it easier to compare Tuples.
Other than that, I would look at it from a logical perspective, do I need to match a given key to a value?, or do I just need to couple together two values and a class might be a bit of an overhead for that.
One downside of Tuples as I see it, is that you have to deal with properties labeled Item1 and Item2, which might make it a bit less readable.
Also, remember that a Tuple is a class (an Immutable one) and KeyValuePair is a struct, so when you passing them as arguments you pass Tuple by reference and KeyValuePair by value (except for explicitly declaring ref or out)
To add to the other answers, there are sometimes advantages to storing key-value data as a list of tuples instead of in a dictionary.
Depending on your needs, it might not be important that your lookups are fast, but it might be important that the order you insert into the list remains fixed. You can iterate through the keys and the values in a dictionary but the order is not defined.
Another advantage is that you can put as many pairs with the same first element into the list as you want, where with a dictionary you can only have one value per unique key.
One advantage of tuples over dictionaries is that tuples can be named.
Using List<(string text, string url)> links is more meaningful than Dictionary<string, string> links.
I need to store a list of int-string key-value pairs with a requirement to preserve the order in which items were added. Once it is initialized, it does not change, i.e. nothing added or removed..
At first I thought of using Dictionary<int,string> and everytime I need to access the items use
foreach(var entry in dict.OrderBy(e=>e.Key)) { } //as Key is `int`
However everytime ordering does not seem to be the best option to use.
Now I've come to an idea to have a List<Tuple<int, string>>, as soon as List<T> guarantees the order of items.
So, is there a better option?
Looking at the proposed possibilities:
Dictionary doesn't guarantee the order of the items
SortedDictionary sorts the items, but not in the order you added them (it sorts based on key comparison),
OrderedDictionary keeps the order, but it's not generic and would introduce unnecessary casting and boxing.
So I think you should use List<Tuple<int, string>>. It preserves order and it's good enough for iteration using foreach and indexed access. If you know the size in advance, you could use an array as well, or a read only collection type, as Cuong Le suggested in his answer.
If you allow duplicate key, List<Tuple<int, string>> would be the best choice. In order make your list as read-only, you can public read-only list after initialization:
var readonlyList = new ReadOnlyCollection<Tuple<int, string>>(yourlist);
Although Dictionary seems suitable as a data structure, it does not guarantee order of items.
List does not seem proper as it does not fit well with the data structure you are trying to store.
You may use an OrderedDictionary which guarantees the order of items.
foreach (DictionaryEntry entry in orderedDictionary)
{
//...
}
I am in process of implementing design where information is organized in following way:
Multiple sources will feed us with multiple case information.
Each source is identified by SourceID string and each case is idenfied by CaseID string.
The information packet will be encapsulated as InfoObject.
I thought of coding it as:
// MasterDB :(0..m)SourceID -- 1..n -> [CaseID, InfoObject]
//
private Dictionary<string, Dictionary<string, InfoObject>> MasterDB;
Class InfoObject
{
string user;
}
This way addition and removal becomes very easy as they use SourceID and CaseID as key.
However lookup is little special. I want to lookup for a particular user (which is embedded inside the InfoObject).
How should I reorganize such that things become little efficient both for lookup and addition/removal?
UPDATE:
I tried a different approach using LINQ
var targetList = from entry in MasterDB
from entrytarget in entry.Value
where (entrytarget.Value.user == username)
select entrytarget.Value;
Minor issue is that the returned list is a IEnumerable list. Not sure if I can make LINQ output some other way.
you could just have a separate lookup for users:
Dictionary<string, InfoObject> userLookup;
This only of course if you want to optimize for lookup speed, the downside is that addition and removal you have to do now on two separate data structures which have to keep in sync.
I would propose to maintain additionally another dictionary, mapping the user into the appropriate collection of all relevant SourceID/CaseID pairs.
I would use a Dictionary<string, IList<InfoObject> since nothing guarantees a user can't have more than one SourceID/CaseID pair.
Whenever you insert something in your other dictionary, you also insert the same InfoObjects in that dictionary of IList. This way the InfoObject retrieved is the same object.
I'm trying to reverse items in a dictionary in C#.
I have tried:
Dictionary<double, int> dict = new Dictionary<double, int>();
...add itmes to it....
var v = dict.Reverse()
However, dict.Reverse() gives me a type of IEnumberable>. I was just wondering how I could make it to a type of Dictionary?
Thanks in advance.
Stop!
Dictionaries and hashtables and sets have no ordering.
There is absolutely no point in sorting or changing the order.
A Dictionary isn't an ordered data structure. For Reverse to have any real meaning you'll need to use a SortedDictionary. You can get a reversed copy of a SortedDictionary by creating a new one with a Comparer that does the opposite sorting to the original (see constructor).
var reversed = new SortedDictionary( original, new ReverseKeyComparer() );
Note that ReverseKeyComparer is a ficticious class for the example.
Also - you need to know that the SortedDictionary is somewhat of a misnomer, if you equate Dictionary to map or hashtable. It uses a binary tree implementation (Red-Black, I think) with different algorithmic complexity than the hashtable implementation of Dictionary. See the Remarks sections of their respective documentation pages. If performance is critical, you might want to consider whether the ordering is truly important.
If you want dictionaries to have a certain order, you should look into SortedDictionary.
See this article.
I want to build 2-dimentional collection where i need unique combination of key value pairs. For example Domain "Company" (Id: 1) can have MachineName "Machine1" and "Machine2", but cannot add another MachineName "Machine1" again. Another Domain "Corporate" (Id:2) can have another machineName "Machine1".
here my collection will be like this 1-Machine1, 1-Machine2, 2-Machine1.
Adding 1-Machine1 or 2-Machine1 should be invalid entry.
Please suggest datatype or approach for this.
I cannot use Dict> datatype, because it may hamper performance if size grows.
So you need some kind of collection with a unique key, and each item within this collection is unique.
So really, you're talking about a dictionary where the value within the dictionary is a unique collection.
Assuming you're only talking about strings, I'd be using something like:
Dictionary<string, HashSet<string>>
Someone correct me if I'm wrong, but I think the advantage of using these generic structures is you can (right off the bat), do this:
Dictionary<string, HashSet<string>> domains = new Dictionary<string, HashSet<string>>();
domains["Domain1"].Add("Machine1");
I'm sorry, but from your description it still sounds like a Dictionary implementation would be a good fit.
If and when the performance of the application suffers due to the speed of the dictionary, then you can revisit the problem and roll your own specifically tailored solution.
You could do something like this:
Dictionary<String, List<String>> mapping = new Dictionary<string, List<string>>();
mapping.Add("1",new List<string>());
mapping["1"].Add("Machine1");
mapping["1"].Add("Machine2");
This will give you a one to many mapping between domain and machines.
or the NameValueCollection class would do the same.
Do you need to be able to look up the list of domains with given machine name efficiently? Otherwise a Hashtable<String, HashSet<String>> seems like a good fit.
There also seems to be something called NameValueCollection which might be a good fit if you change the defaults so that it isn't case- or culture-sensitive.
You didn't state this as a requirement, but my guess is that you also need to be able to query the data structure for all of the machines for a specific "domain". Ex. list the machines belonging to Company 1. This is the only reason I can think of where the performance of using a Dictionary might be unacceptable (since you would have to traverse the entire list to find all of the matching entries).
In that case you might consider representing the data as a tree.
Edit:
Based on your comment above, you could just concatenate your keys as a string and use a HashSet to check if you've already stored that key.