Say for example I have
Dictionary<string, double> foo;
I can do
foo["hello"] = foo["hello"] + 2.0
Or I could do
foo["hello"] += 2.0
but the compiler just expands this to the code above. I verified that by using JetBrains .Peek to look at the assemblies.
This seems wasteful as two key lookups are required to update. Is there a dictionary implementation that can do this in one lookup? Note I'm using a dictionary to store 100k items of geometry information from a mesh and the lookups are in an inner loop. Please no "premature optimization is the root of all evil" answers. :)
Yes I have profiled.
Using a class would probably be faster as the comments mention because:
With a struct, you must do a double look-up as mentioned in the comments.
With a class, you simply go to the memory of the class reference and can update it there.
Each Lookup:
GetHashCode
Get the bucket
Iterate through to find the right one
(This all involves reading multiple ref object values)
However, if you use a class and update its value:
Change the value at the correct position relative to that ref.
It's a single change in memory.
#George Duckett's solution should be much faster. Change to a class and get the ref and update the object's value:
var hello = foo["hello"];
hello.howAreYou += 2.0;
By the way, this is an example case where a mutable class will win in performance over the immutable struct.
There's a method in ConcurrentDictionary, ConcurrentDictionary.AddOrUpdate, that does what you want. You can update an existing value in the dictionary based on its previous value in one go.
However, the concurrent dictionary is supposed to be used in multiple thread situations, so I can imagine it does some locking which might defeat your optimization goal. But then again, you can always benchmark and see how it goes.
No, it is not. As noted in the comment by bradgonesurfing, the language lacks a way to return reference to the stored value, so when it has to change that value, it needs to find it again.
Also, you said you are storing pairs of integers. Did you thought about using an array? Even 100k long array is not even 1MB big. And I'm sure it would be fastest you can get.
Related
I need to modify all of the values in a Dictionary. Typically, modifying a Dictionary while enumerating it throws an exception. There are various ways to work around that, but all of the answers I've seen involve allocating temporary storage. See Editing dictionary values in a foreach loop for an example.
I would like to modify all the values without allocating any memory. Writing a custom struct enumerator the for the values that disregarded the dictionary version would be fine, but since all the important members of the dictionary are private, this seems impossible.
You're definitely getting into some nitty-gritty performance optimization here.
Based on the additional information you've given in the comments, it sounds like the best approach (short of upgrading your memory so you can handle a little more allocation) will probably be to take the Dictionary source code and make a new class specifically for this purpose, which doesn't increment the version field if it's only changing a value.
UPDATE:
Starting with .Net 4.7.2, HashSet.TryGetValue - docs is available.
HashSet.TryGetValue - SO post
I have a problem with HashSet because it does not provide any method similar to TryGetValue known from Dictionary. And I need such method -- passing element to find in the set, and set returning element from its collection (when found).
Sidenote -- "why do you need element from the set, you already have that element?". No, I don't, equality and identity are two different things.
HashSet is not sealed but all its fields are private, so deriving from it is pointless. I cannot use Dictionary instead because I need SetEquals method. I was thinking about grabbing a source for HashSet and adding desired method, but the license is not truly open source (I can look, but I cannot distribute/modify). I could use reflection but the arrays in HashSet are not readonly meaning I cannot bind to those fields once per instance lifetime.
And I don't want to use full blown library for just single class.
So far I am stuck with LINQ SingleOrDefault. So the question is how fix this -- have HashSet with TryGetValue?
Probably you should switch from a HashSet to a SortedSet
There is a simple TryGetValue() for a SortedSet:
public bool TryGetValue(ref T element)
{
var foundSet = sortedSet.GetViewBetween(element, element);
if(foundSet.Count == 1)
{
element = foundSet.First();
return true;
}
return false;
}
when called, the element needs just all properties set which are used in the Comparer. It returns the element found in the Set.
I agree this is something which is basically missing. While it's only useful in rare cases, I think they're significant rare cases - most notable, key canonicalization.
I can only think of one suggestion at the moment, and it's truly foul.
You can specify your own IEqualityComparer<T> when creating a HashSet<T> - so create one which remembers the arguments to the last positive (i.e. true-returning) Equals comparison it has performed. You can then call Contains, and see what the equality comparer was asked to compare.
Caveats:
This holds on to references unnecessarily, so could end up preventing objects being garbage collected
You'd potentially want to do this on a per-thread basis (if you've got a set that isn't modified after initialization, but is then read by multiple threads, for example)
It assumes that HashSet<T> doesn't use any optimization such as "if the references are equal, don't bother consulting the equality comparer"
It's fundamentally a horrible abuse
I've been trying to think of other alternatives in terms of finding intersections, but I haven't got anywhere yet...
As noted in comments, it would be worth encapsulating this as far as possible - I suspect you only need a very limited set of operations, so I'd wrap a HashSet<T> in your own class and only expose the operations you really need - that way you get to clear the "cache" after each operation, removing my first objection above.
It still feels like a horrible abuse to me, but...
As others have suggested, an alternative would be to use a Dictionary<TKey, TValue> and implement SetEquals yourself. That would be simple enough to do - and again, you'd want to encapsulate this in your own type. Either way, you should probably design the type itself first, and then implement it using either a HashSet<> or a Dictionary<,> as an implementation detail.
Sounds like you trying to use the wrong tool. True, you can save some memory using a HashSet but it seems to me that you are trying to acheeve a different goal: Get the actual element that is just equal to a representation.
So in reality they are two different elements. Just the memento (a unique representation) is equal.
Therefore you'd be better of using a Dictionary where you add your elements as Key and Value. So you're able to get it back (the identical) but you miss your SetEquals....
I suppose SetEquals in it's implementation does nothing much different than sequencially compare two HashSets in it's bucket order and fails on first non-equality.
So you should be equally good off using a simple SequenceEqual() (LINQ) comparing the two Keys collections.
So this extension method could do
public static SetEqual<T,G>(this IDictionary<T,G> d, IDictionary<T,G> e)
{
return d.Keys.SequenceEqual(e.Keys);
}
This should work, because a Dictionary basically is a HashSet with an associated value. And more appropriate to your problem. (OK, to be correct, the code should go for Dictionary<> instead of IDictionary<> because Key order matters)
If you need an IEnumerable<> on the second parameter try sorting to get a defined order (not so efficient).
Finally added in .NET 4.7.2:
HashSet.TryGetValue(T, T) Method
An SO post with more details
hopefully not blind but I haven't seen this answer anywhere. If you want dictionary's TryGetValue, you can just steal it.
theHashset.ToDictionary(item => item.ID).TryGetValue(key, out value)
All you need is a quick lambda for determining unique keys.
After reading the excellent accepted answer in this question:
How is the c#/.net 3.5 dictionary implemented?
I decided to set my initial capacity to a large guess and then trim it after I read in all values. How can I do this? That is, how can I trim a Dictionary so the gc will collect the unused space later?
My goal with this is optimization. I often have large datasets and the time penalty for small datasets is acceptable. I want to avoid the overhead of reallocating and copying the data that is incured with small initial capacities on large datasets.
According to Reflector, the Dictionary class never shrinks. void Resize() is hard-coded to always double the size.
You can probably create a new dictionary and use the respective constructor to copy over the items. This will be quite inefficient.
Or, implement your own dictionary with the existing one as a blue-print. This is less work than you might think at first.
Be sure to benchmark both approaches.
In .NET 5 there is the method TrimExcess doing exactly what you're asking:
Sets the capacity of this dictionary to what it would be if it had
been originally initialized with all its entries.
You might consider putting your data in a list first. Then you know the list's size, and can create a dictionary with that capacity (now exactly right for the data you want) and populate it.
Allowing the list to dynamically resize (as you add the elements) should be cheaper than allowing a dictionary to resize. (But, as others have noted, test the performance yourself!) Resizing a dictionary involves a rehashing operation, which means every element's GetHashCode will get called again, as well as the reference being copied into the new data structure. Resizing a list just means copying the references, so should be cheaper.
I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?
I am designing a C# class that contains a string hierarchy, where each string has 0 or 1 parents.
My inclination is to implement this with a Dictionary<string,string> where the key is the child and value is the parent. The dictionary may have a large amount of values, but I can't say the exact size. This seems like it should perform faster than creating composite wrapper with references to the parent, but I could be wrong.
Is there an alternative approach I can take that will ensure better performance speed?
Retrieving values from a Dictionary<K,V> is extremely fast (close to O(1), i.e., almost constant time lookup regardless of the size of the collection) because the underlying implementation uses a hash table. Of course, if the key type uses a terrible hashing algorithm than the performance can degrade, but you can rest assured that this is likely not the case for the framework's string type.
However, as I asked in my comment, you need to answer a few questions:
Define what performance metric is most important, i.e., Time (CPU) or space (memory).
What are your requirements? How will this be used? What's your worst case scenario? Is this going to hold a ton of data with relatively infrequent lookups, do many lookups need to be performed in a short amount of time, or do both hold true?
The Dictionary<K,V> class also uses an array internally which will grow as you add items. Is this okay for you? Again, you need to be more specific in terms of your requirements before anyone can give you a complete answer.
Using a Dictionary will be slower than using direct references, because the Dictionary will have to compute a hash etc. If you really only need the parent and not the child's operation (which I doubt), then you could store the Strings in an array together with the index of the parent String.