I am in process of implementing design where information is organized in following way:
Multiple sources will feed us with multiple case information.
Each source is identified by SourceID string and each case is idenfied by CaseID string.
The information packet will be encapsulated as InfoObject.
I thought of coding it as:
// MasterDB :(0..m)SourceID -- 1..n -> [CaseID, InfoObject]
//
private Dictionary<string, Dictionary<string, InfoObject>> MasterDB;
Class InfoObject
{
string user;
}
This way addition and removal becomes very easy as they use SourceID and CaseID as key.
However lookup is little special. I want to lookup for a particular user (which is embedded inside the InfoObject).
How should I reorganize such that things become little efficient both for lookup and addition/removal?
UPDATE:
I tried a different approach using LINQ
var targetList = from entry in MasterDB
from entrytarget in entry.Value
where (entrytarget.Value.user == username)
select entrytarget.Value;
Minor issue is that the returned list is a IEnumerable list. Not sure if I can make LINQ output some other way.
you could just have a separate lookup for users:
Dictionary<string, InfoObject> userLookup;
This only of course if you want to optimize for lookup speed, the downside is that addition and removal you have to do now on two separate data structures which have to keep in sync.
I would propose to maintain additionally another dictionary, mapping the user into the appropriate collection of all relevant SourceID/CaseID pairs.
I would use a Dictionary<string, IList<InfoObject> since nothing guarantees a user can't have more than one SourceID/CaseID pair.
Whenever you insert something in your other dictionary, you also insert the same InfoObjects in that dictionary of IList. This way the InfoObject retrieved is the same object.
Related
Normally, I use a dictionary like a list, but with a key of a different type. I like the ability to quickly access individual items in the dictionary without having to loop through it until I find the item with the right property (because the property I'm looking for is in the Key).
But there is another possible use of a dictionary. I could just use the Key to store property A and the Value to store property B without ever using the dictionary's special functionality. For example, I could store a list of persons just by storing the forename in the key and the family name in the value (let's assume, for the sake of simplicity, that there won't ever be two people with the same forename, because I just couldn't come up with an better example). I would only use that dictionary to loop through it in a foreach loop and add items to it (no removing, sorting or accessing individual items). There would actually be no difference to using a List<KeyValuePair<string, string>> from using a Dictionary<string, string> (at least not in the example that I gave - I know that I could e. g. store multiple items wiht the same key in the list).
So, to sum it up, what should I do when I don't need to use the special functionalities a dictionary provides and just use it to store something that has exactly two properties:
use a Dictionary<,>
use a List<KeyValuePair<,>
use a List<MyType> with MyType being a custom class that contains the two properties and a constructor.
Don't use dictionaries for that.
If you don't want to create a class for this purpose, use something like List<Tuple<T1,T2>>. But keep in mind a custom class will be both more readable and more flexible.
Here's the reason: it will be much more easy to read your code if you use proper data structures. Using a dictionary will only confuse the reader, and you'll have problems the day a duplicate key shows up.
If someone reads your code and sees a Dictionary being used, he will assume you really mean to use a map-like structure. Your code should be clear and your intent should be obvious when reading it.
If you're concerned with performance you should probably store the data in a List. A Dictionary has lots of internal overhead. Both memory as well as CPU.
If you're just concerned with readability, chose the data structure that best captures your intent. If you are storing key-value pairs (for example, custom fields in a bug tracker issue) then use a Dictionary. If you are just storing items without them having some kind of logical key, use a List.
It takes little work to create a custom class to use as an item in a List. Using a Dictionary just because it gives you a Key property for each item is a misuse of that data structure. It is easy to create a custom class that also has a Key property.
Use List<MyType> where MyType includes all the values.
The problem with the dictionary approach is that it's not flexible. If you later decide to add middle names, you'll need to redesign your whole data structure, rather than just adding another field to MyType.
I need to store a list of int-string key-value pairs with a requirement to preserve the order in which items were added. Once it is initialized, it does not change, i.e. nothing added or removed..
At first I thought of using Dictionary<int,string> and everytime I need to access the items use
foreach(var entry in dict.OrderBy(e=>e.Key)) { } //as Key is `int`
However everytime ordering does not seem to be the best option to use.
Now I've come to an idea to have a List<Tuple<int, string>>, as soon as List<T> guarantees the order of items.
So, is there a better option?
Looking at the proposed possibilities:
Dictionary doesn't guarantee the order of the items
SortedDictionary sorts the items, but not in the order you added them (it sorts based on key comparison),
OrderedDictionary keeps the order, but it's not generic and would introduce unnecessary casting and boxing.
So I think you should use List<Tuple<int, string>>. It preserves order and it's good enough for iteration using foreach and indexed access. If you know the size in advance, you could use an array as well, or a read only collection type, as Cuong Le suggested in his answer.
If you allow duplicate key, List<Tuple<int, string>> would be the best choice. In order make your list as read-only, you can public read-only list after initialization:
var readonlyList = new ReadOnlyCollection<Tuple<int, string>>(yourlist);
Although Dictionary seems suitable as a data structure, it does not guarantee order of items.
List does not seem proper as it does not fit well with the data structure you are trying to store.
You may use an OrderedDictionary which guarantees the order of items.
foreach (DictionaryEntry entry in orderedDictionary)
{
//...
}
I've got a list which stores a number of objects. Each object has a property in the form of a variable.
I'd like to be able to check if any of the items in this list contain a certain property. Similar to the Dictionary's ContainsKey method. This data structure is to hold an extremely large amount of values, possibly even millions and I would thus like to use a data structure which can check the properties as fast as possible.
Would Dictionary be the fastest for this job, or are there faster data structures?
EDIT:
Here's a quick, small example of what I'd like to achieve:
Dictionary<string, Person> persons = new Dictionary<string, Person>(); //where string contains the Person's name
bool isPresent = persons.ContainsKey("Matt");
It sounds like you basically just need a HashSet<T> containing all the property values - assuming you really just want to know whether it's contained or not.
For example:
var allNames = new HashSet<string>(people.Select(person => person.Name));
It depends. If you can load the data into a dictionary once and then query it multiple times, then a dictionary is clearly the fastest possible data structure. If several items can have the same property value, you will have to create a Dictionary<TKey,List<TValue>> or to use a LINQ Lookup.
However, if you have to load the list each time you query it, then there is no benefit in using a dictionary. You can detect the right properties while loading the list or, if you are querying a database, then try to load just the required data by using an appropriate where clause.
Basically I have a Dictionary<Guid, Movie> Movies collection and search for movies using Guid, which is basically movie.Guid. It works great, but I also want to be able to search the same dictionary using movie.Name without looping through each element.
Is this possible or do I have to create another Dictionary<K, V> for this?
Just have two Dictionaries, one of them having the guid as its key and the other with the name as its key.
If you don't want to look at every element, you need to index it the other direction. This means another Dictionary to get O(1).
You can iterate across the variables but then you arnt getting the constant-time searching value in a dictionary (because of the way that the keys are hashed.) The answer above regarding using two dictionarys to hash references to your object may be a good solution if you dont have too many objects to reference.
You could search with the Values property:
dictionary.Values.Where(movie => movie.Name == "Some Name")
You'll lose the efficiency of a key based look up, but it will still work.
Since dictionaries are for one-way mapping you can't get keys from values.
You'll need two dictionaries.
There is also a suggestion:
You can use a custom hash function for keys instead of GUIDs and store Movie Names hash as keys. Then you can actually perform two way search in your dictionary.
Rather than using two dictionaries, you'd be much better off using one container class that has two dictionaries inside it.
Some guy named Jon came up with a partial solution to this (which you could easily build upon), leaving his code here: Getting key of value of a generic Dictionary?
You can't use that dictionary to do that search with anything like the same efficiency. But you can easily just run a LINQ query against your dictionary's Values property, which is just collection of the Movie values.
var moviesIWant = From m in movieLookup.Values
Where m.Name == "Star Wars"
Select m
Some thoughts:
When you find your answer though, you would not have the guids, unless they were also a property of movie.
For a small dictionary, this is just fine. For large and repeated searches, you should consider the creation of other dictionaries keyed on the other values you wish to search on. Only in this way would you achieve the speed of a guid lookup comparable to your original dictionary.
You could create another dictionary keyed by Name. Once you've done this, you could search this dictionary by it's key and it would have the same super-efficiency of your original dictionary, even for a very large dictionary.
var moviesByName = movieLookup.Values.ToDictionary(m => m.Name, m => m)
No I don't believe it is possible. You'll have to use another dictionary.
If you are going to want to search on more movie attributes you may be better off moving the data down to a database and use that for querying. That is what databases are good for after all.
I want to build 2-dimentional collection where i need unique combination of key value pairs. For example Domain "Company" (Id: 1) can have MachineName "Machine1" and "Machine2", but cannot add another MachineName "Machine1" again. Another Domain "Corporate" (Id:2) can have another machineName "Machine1".
here my collection will be like this 1-Machine1, 1-Machine2, 2-Machine1.
Adding 1-Machine1 or 2-Machine1 should be invalid entry.
Please suggest datatype or approach for this.
I cannot use Dict> datatype, because it may hamper performance if size grows.
So you need some kind of collection with a unique key, and each item within this collection is unique.
So really, you're talking about a dictionary where the value within the dictionary is a unique collection.
Assuming you're only talking about strings, I'd be using something like:
Dictionary<string, HashSet<string>>
Someone correct me if I'm wrong, but I think the advantage of using these generic structures is you can (right off the bat), do this:
Dictionary<string, HashSet<string>> domains = new Dictionary<string, HashSet<string>>();
domains["Domain1"].Add("Machine1");
I'm sorry, but from your description it still sounds like a Dictionary implementation would be a good fit.
If and when the performance of the application suffers due to the speed of the dictionary, then you can revisit the problem and roll your own specifically tailored solution.
You could do something like this:
Dictionary<String, List<String>> mapping = new Dictionary<string, List<string>>();
mapping.Add("1",new List<string>());
mapping["1"].Add("Machine1");
mapping["1"].Add("Machine2");
This will give you a one to many mapping between domain and machines.
or the NameValueCollection class would do the same.
Do you need to be able to look up the list of domains with given machine name efficiently? Otherwise a Hashtable<String, HashSet<String>> seems like a good fit.
There also seems to be something called NameValueCollection which might be a good fit if you change the defaults so that it isn't case- or culture-sensitive.
You didn't state this as a requirement, but my guess is that you also need to be able to query the data structure for all of the machines for a specific "domain". Ex. list the machines belonging to Company 1. This is the only reason I can think of where the performance of using a Dictionary might be unacceptable (since you would have to traverse the entire list to find all of the matching entries).
In that case you might consider representing the data as a tree.
Edit:
Based on your comment above, you could just concatenate your keys as a string and use a HashSet to check if you've already stored that key.