Hopefully simple question about modifying dictionaries in C# - c#

I have a huge dictionary of blank values in a variable called current like so:
struct movieuser {blah blah blah}
Dictionary<movieuser, float> questions = new Dictionary<movieuser, float>();
So I am looping through this dictionary and need to fill in the "answers", like so:
for(var k = questions.Keys.GetEnumerator();k.MoveNext(); )
{
questions[k.Current] = retrieveGuess(k.Current.userID, k.Current.movieID);
}
Now, this doesn't work, because I get an InvalidOperationException from trying to modify the dictionary I am looping through. However, you can see that the code should work fine - since I am not adding or deleting any values, just modifying the value. I understand, however, why it is afraid of my attempting this.
What is the preferred way of doing this? I can't figure out a way to loop through a dictionary WITHOUT using iterators.
I don't really want to create a copy of the whole array, since it is a lot of data and will eat up my ram like its still Thanksgiving.
Thanks,
Dave

Matt's answer, getting the keys first, separately is the right way to go. Yes, there'll be some redundancy - but it will work. I'd take a working program which is easy to debug and maintain over an efficient program which either won't work or is hard to maintain any day.
Don't forget that if you make MovieUser a reference type, the array will only be the size of as many references as you've got users - that's pretty small. A million users will only take up 4MB or 8MB on x64. How many users have you really got?
Your code should therefore be something like:
IEnumerable<MovieUser> users = RetrieveUsers();
IDictionary<MovieUser, float> questions = new Dictionary<MovieUser, float>();
foreach (MovieUser user in users)
{
questions[user] = RetrieveGuess(user);
}
If you're using .NET 3.5 (and can therefore use LINQ), it's even easier:
IDictionary<MovieUser, float> questions =
RetrieveUsers.ToDictionary(user => user, user => RetrieveGuess(user));
Note that if RetrieveUsers() can stream the list of users from its source (e.g. a file) then it will be efficient anyway, as you never need to know about more than one of them at a time while you're populating the dictionary.
A few comments on the rest of your code:
Code conventions matter. Capitalise the names of your types and methods to fit in with other .NET code.
You're not calling Dispose on the IEnumerator<T> produced by the call to GetEnumerator. If you just use foreach your code will be simpler and safer.
MovieUser should almost certainly be a class. Do you have a genuinely good reason for making it a struct?

Is there any reason you can't just populate the dictionary with both keys and values at the same time?
foreach(var key in someListOfKeys)
{
questions.Add(key, retrieveGuess(key.userID, key.movieID);
}

store the dictionary keys in a temporary collection then loop over the temp collection and use the key value as your indexer parameter. This should get you around the exception.

Related

Modify dictionary values without allocating

I need to modify all of the values in a Dictionary. Typically, modifying a Dictionary while enumerating it throws an exception. There are various ways to work around that, but all of the answers I've seen involve allocating temporary storage. See Editing dictionary values in a foreach loop for an example.
I would like to modify all the values without allocating any memory. Writing a custom struct enumerator the for the values that disregarded the dictionary version would be fine, but since all the important members of the dictionary are private, this seems impossible.
You're definitely getting into some nitty-gritty performance optimization here.
Based on the additional information you've given in the comments, it sounds like the best approach (short of upgrading your memory so you can handle a little more allocation) will probably be to take the Dictionary source code and make a new class specifically for this purpose, which doesn't increment the version field if it's only changing a value.

How can I know if I should access value by index or key in an OrderedDictionary in C#?

All this time I was using Dictionary to store key/value pairs until I came across this new class called OrderedDictionary which has got an additional feature of accessing data through index.
So, I wanted to know when could/would I be running into any situation that would ask me to access value through index when I have the key already. I have a small snippet below.
OrderedDictionary od = new OrderedDictionary();
od.Add("Key1", "Val1");
od.Add("Key2", "Val2");
od.Add("Key3", "Val3");
od.Add("Key4", "Val4");
Probably, the code above may not seem appropriate but, I would really appreciate if someone can give a better one to answer by question.
Many Thanks!
I wanted to know when could/would I be running into any situation that would ask me to access value through index when I have the key already
I follow the YAGNI principle - You Aren't Gonna Need It. If you already know the key, then what value is there in accessing by index? The point of a dictionary is to do FAST lookups by key (by not scanning the entire collection). With an OrderedDictionary, lookups are still fast, but inserts and updates are marginally slower because the structure must keep the keys and indices in sync. Plus, the current framework implementation is not generic, so you'll have to do more casting, but there are plenty of 3rd party generic implementations out there. The fact that MS did not create a generic implementation may tell you something about the value of that type overall.
So the situation you "could" run into is needing to access the values in key order. In that case you'll need to decide if you do that often enough to warrant the overhead of an OrderedDictionary or if you can just use Linq queries to order the items outside of the structure.
Theory of Hash based collections
Before choosing between Dictionary and OrderedDictionary, let's look how some collections are built.
Arrays
Arrays provide time constant access when you now the index of your value.
So keys must be integers. If you don't know the index, you must traverse the full collection to check your value is the one you're looking for.
Dictionaries
Purpose of dictionary is to provide a (relatively) time constant access to any value in it when key is not an integer. However, since there is not always a perfect hash function to get an integer from a value, there will collisions of hash codes and then when several values have the same hash codes, they are added in a array. And search for theses collided values will be slower (since it must traverse the array).
OrderedDictionary
OrderedDictionary is kind of mix between the two previous collections. Index search will/should be the fastest (however you need to profile to be sure of that point). The problem with index search is that apart from special cases, you don't know the index in which your value was stored, so you must rely on the key. Which makes me wonder, why would you need an OrderedDictionary ?
As one comment implies, I would be very interested to know what's your use case for such a collection. Most of the times, you either know the index or don't know it, because it relies on the value nature. So you should either use an array or a Dictionary, not a mix of both.
Two use cases very different:
KeyValuePair<string, string>[] values = new KeyValuePair<string, string>[4];
values[0] = new KeyValuePair<string, string>("Key1", "Value1");
// And so on...
// Or
Dictionary<string, Person> persons = new Dictionary<string, Person>();
var asker = new Person { FirstName = "pradeep", LastName=" pradyumna" };
persons.Add(asker.Key, asker);
// Later in the code, you cannot know the index of the person without having the person instance.

How optimize the update a c# dictionary with a single key lookup?

Say for example I have
Dictionary<string, double> foo;
I can do
foo["hello"] = foo["hello"] + 2.0
Or I could do
foo["hello"] += 2.0
but the compiler just expands this to the code above. I verified that by using JetBrains .Peek to look at the assemblies.
This seems wasteful as two key lookups are required to update. Is there a dictionary implementation that can do this in one lookup? Note I'm using a dictionary to store 100k items of geometry information from a mesh and the lookups are in an inner loop. Please no "premature optimization is the root of all evil" answers. :)
Yes I have profiled.
Using a class would probably be faster as the comments mention because:
With a struct, you must do a double look-up as mentioned in the comments.
With a class, you simply go to the memory of the class reference and can update it there.
Each Lookup:
GetHashCode
Get the bucket
Iterate through to find the right one
(This all involves reading multiple ref object values)
However, if you use a class and update its value:
Change the value at the correct position relative to that ref.
It's a single change in memory.
#George Duckett's solution should be much faster. Change to a class and get the ref and update the object's value:
var hello = foo["hello"];
hello.howAreYou += 2.0;
By the way, this is an example case where a mutable class will win in performance over the immutable struct.
There's a method in ConcurrentDictionary, ConcurrentDictionary.AddOrUpdate, that does what you want. You can update an existing value in the dictionary based on its previous value in one go.
However, the concurrent dictionary is supposed to be used in multiple thread situations, so I can imagine it does some locking which might defeat your optimization goal. But then again, you can always benchmark and see how it goes.
No, it is not. As noted in the comment by bradgonesurfing, the language lacks a way to return reference to the stored value, so when it has to change that value, it needs to find it again.
Also, you said you are storing pairs of integers. Did you thought about using an array? Even 100k long array is not even 1MB big. And I'm sure it would be fastest you can get.

C# dictionary vs list usage

I had two questions. I was wondering if there is an easy class in the C# library that stores pairs of values instead of just one, so that I can store a class and an integer in the same node of the list. I think the easiest way is to just make a container class, but as this is extra work each time. I wanted to know whether I should be doing so or not. I know that in later versions of .NET ( i am using 3.5) that there are tuples that I can store, but that's not available to me.
I guess the bigger question is what are the memory disadvantages of using a dictionary to store the integer class map even though I don't need to access in O(1) and could afford to just search the list? What is the minimum size of the hash table? should i just make the wrapper class I need?
If you need to store an unordered list of {integer, value}, then I would suggest making the wrapper class. If you need a data structure in which you can look up integer to get value (or, look up value to get integer), then I would suggest a dictionary.
The decision of List<Tuple<T1, T2>> (or List<KeyValuePair<T1, T2>>) vs Dictionary<T1, T2> is largely going to come down to what you want to do with it.
If you're going to be storing information and then iterating over it, without needing to do frequent lookups based on a particular key value, then a List is probably what you want. Depending on how you're going to use it, a LinkedList might be even better - slightly higher memory overheads, faster content manipulation (add/remove) operations.
On the other hand, if you're going to be primarily using the first value as a key to do frequent lookups, then a Dictionary is designed specifically for this purpose. Key value searching and comparison is significantly improved, so if you do much with the keys and your list is big a Dictionary will give you a big speed boost.
Data size is important to the decision. If you're talking about a couple hundred items or less, a List is probably fine. Above that point the lookup times will probably impact more significantly on execution time, so Dictionary might be more worth it.
There are no hard and fast rules. Every use case is different, so you'll have to balance your requirements against the overheads.
You can use a list of KeyValuePair:http://msdn.microsoft.com/en-us/library/5tbh8a42.aspx
You can use a Tuple<T,T1>, a list of KeyValuePair<T, T1> - or, an anonymous type, e.g.
var list = something.Select(x => new { Key = x.Something, Value = x.Value });
You can use either KeyValuePair or Tuple
For Tuple, you can read the following useful post:
What requirement was the tuple designed to solve?

Collection that lets access item by key but doesn't require duplicate checking on addition?

I'm asking for something that's a bit weird, but here is my requirement (which is all a bit computation intensive, which I couldn't find anywhere so far)..
I need a collection of <TKey, TValue> of about 30 items. But the collection is used in massively nested foreach loops that would iterate possibly almost up to a billion times, seriously. The operations on collection are trivial, something that would look like:
Dictionary<Position, Value> _cells = new
_cells.Clear();
_cells.Add(Position.p1, v1);
_cells.Add(Position.p2, v2);
//etc
In short, nothing more than addition of about 30 items and clearing of the collection. Also the values will be read from somewhere else at some point. I need this reading/retrieval by the key. So I need something along the lines of a Dictionary. Now since I'm trying to squeeze out every ounce from the CPU, I'm looking for some micro-optimizations as well. For one, I do not require the collection to check if a duplicate already exists while adding (this typically makes dictionary slower when compared to a List<T> for addition). I know I wont be passing duplicates as keys.
Since Add method would do some checks, I tried this instead:
_cells[Position.p1] = v1;
_cells[Position.p2] = v2;
//etc
But this is still about 200 ms seconds slower for about 10k iterations than a typical List<T> implementation like this:
List<KeyValuePair<Position, Value>> _cells = new
_cells.Add(new KeyValuePair<Position, Value>(Position.p1, v1));
_cells.Add(new KeyValuePair<Position, Value>(Position.p2, v2));
//etc
Now that could scale to a noticeable time after full iteration. Note that in the above case I have read item from list by index (which was ok for testing purposes). The problem with a regular List<T> for us are many, the main reason being not being able to access an item by key.
My question in short are:
Is there a custom collection class that would let access item by key, yet bypass the duplicate checking while adding? Any 3rd party open source collection would do.
Or else please point me to a good starter as to how to implement my custom collection class from IDictionary<TKey, TValue> interface
Update:
I went by MiMo's suggestion and List was still faster. Perhaps it has got to do with overhead of creating the dictionary.
My suggestion would be to start with the source code of Dictionary<TKey, TValue> and change it to optimize for you specific situation.
You don't have to support removal of individual key/value pairs, this might help simplifying the code. There apppear to be also some check on the validity of keys etc. that you could get rid of.
But this is still a few ms seconds slower for about ten iterations than a typical List implementation like this
A few milliseconds slower for ten iterations of adding just 30 values? I don't believe that. Adding just a few values should take microscopic amounts of time, unless your hashing/equality routines are very slow. (That can be a real problem. I've seen code improved massively by tweaking the key choice to be something that's hashed quickly.)
If it's really taking milliseconds longer, I'd urge you to check your diagnostics.
But it's not surprising that it's slower in general: it's doing more work. For a list, it just needs to check whether or not it needs to grow the buffer, then write to an array element, and increment the size. That's it. No hashing, no computation of the right bucket.
Is there a custom collection class that would let access item by key, yet bypass the duplicate checking while adding?
No. The very work you're trying to avoid is what makes it quick to access by key later.
When do you need to perform a lookup by key, however? Do you often use collections without ever looking up a key? How big is the collection by the time you perform a key lookup?
Perhaps you should build a list of key/value pairs, and only convert it into a dictionary when you've finished writing and are ready to start looking up.

Categories

Resources