Quick mass-updating a Dictionary - c#

I have a Dictionary<int, int> and would like to update certain elements all at once based on their current values, e.g. changing all elements with value 10 to having value 14 or something.
I imagined this would be easy with some LINQ/lambda stuff but it doesn't appear to be as simple as I thought. My current approach is this:
List<KeyValuePair<int, int>> kvps = dictionary.Where(d => d.Value == oldValue).ToList();
foreach (KeyValuePair<int, int> kvp in kvps)
{
dictionary[KeyValuePair.Key] = newValue;
}
The problem is that dictionary is pretty big (hundreds of thousands of elements) and I'm running this code in a loop thousands of times, so it's incredibly slow. There must be a better way...

This might be the wrong data structure. You are attempting to look up dictionary entries based on their values which is the reverse of the usual pattern. Maybe you could store Sets of keys that currently map to certain values. Then you could quickly move these sets around instead of updating each entry separately.

I would consider writing your own collection type to achieve this whereby keys with the same value actually share the same value instance such that changing it in one place changes it for all keys.
Something like the following (obviously, lots of code omitted here - just for illustrative purposes):
public class SharedValueDictionary : IDictionary<int, int>
{
private List<MyValueObject> values;
private Dictionary<int, MyValueObject> keys;
// Now, when you add a new key/value pair, you actually
// look in the values collection to see if that value already
// exists. If it does, you add an entry to keys that points to that existing object
// otherwise you create a new MyValueObject to wrap the value and add entries to
// both collections.
}
This scenario would require multiple versions of Add and Remove to allow for changing all keys with the same value, changing only one key of a set to be a new value, removing all keys with the same value and removing just one key from a value set. It shouldn't be difficult to code for these scenarios as and when needed.

You need to generate a new dictionary:
d = d.ToDictionary(w => w.Key, w => w.Value == 10 ? 14 : w.Value)

I think the thing that everybody must be missing is that it is exceeeeedingly trivial:
List<int> keys = dictionary.Keys.Where(d => d == oldValue);
You are NOT looking up keys by value (as has been offered by others).
Instead, keys.SingleOrDefault() will now by definition return the single key that equals oldValue if it exists in the dictionary. So the whole code should simplify to
if (dictionary.ContainsKey(oldValue))
dictionary[key] = newValue;
That is quick. Now I'm a little concerned that this might indeed not be what the OP intended, but it is what he had written. So if the existing code does what he needs, he will now have a highly performant version of the same :)

After the edit, this seems an immediate improvement:
foreach (var kvp in dictionary.Where(d => d.Value == oldValue))
{
kvp.Value = newValue;
}
I'm pretty sure you can update the kvp directly, as long as the key isn't changed

Related

Remove a single value from a NameValueCollection

My data source could have duplicate keys with values.
typeA : 1
typeB : 2
typeA : 11
I chose to use NameValueCollection as it enables entering duplicate keys.
I want to remove specific key\value pair from the collection, but NameValueCollection.Remove(key) removes all values associated with the specified key.
Is there a way to remove single key\value pair from a NameValueCollection,
OR
Is there a better collection in C# that fits my data
[EDIT 1]
First, thanks for all the answers :)
I think I should have mentioned that my data source is XML.
I used System.Xml.Linq.XDocument to query for type and also it was handy to remove a particular value.
Now, my question is, for large size data, is using XDocument a good choice considering the performance?
If not what are other alternatives (maybe back to NameValueCollection and using one of the techniques mentioned to remove data)
The idea of storing multiple values with the same key is somehow strange. But I think you can retrieve all values using GetValues then remove the one you don't need and put them back using Set and then subsequent Add methods. You can make a separate extension method method for this.
NameValueCollection doesn't really allow to have multiple entries with the same key. It merely concatenates the new values of existing keys into a comma separated list of values (see NameValueCollection.Add.
So there really is just a single value per key. You could conceivably get the value split them on ',' and remove the offending value.
Edit: #ElDog is correct, there is a GetValues method which does this for you so no need to split.
A better option I think would be to use Dictionary<string, IList<int>> or Dictionary<string, ISet<int>> to store the values as discrete erm, values
You may convert it to Hashtable
var x = new NameValueCollection();
x.Add("a", "1");
x.Add("b", "2");
x.Add("a", "1");
var y = x.AllKeys.ToDictionary(k => k, k=>x[k]);
make your own method, it works for me --
public static void Remove<TKey,TValue>(
this List<KeyValuePair<TKey,TValue>> list,
TKey key,
TValue value) {
return list.Remove(new KeyValuePair<TKey,TValue>(key,value));
}
then call it on list as --
list.Remove(key,value); //Pass the key value...
Perhaps not the best way, but....
public class SingleType
{
public string Name;
public int Value;
}
List<SingleType> typeList = new List<SingleType>();
typeList.Add (new SingleType { Name = "TypeA", Value = 1 });
typeList.Add (new SingleType { Name = "TypeA", Value = 3 });
typeList.Remove (typeList.Where (t => t.Name == "TypeA" && t.Value == 1).Single());
You can use the Dictionary collection instead:
Dictionary<string, int> dictionary = new Dictionary<string, int>();
dictionary.Add("typeA", 1);
dictionary.Add("typeB", 1);
When you try to insert type: 11 it will throw exception as Key already exists. So you can enter a new key to insert this data.
Refer this Tutorial for further help.

sort Dictionary by generic value

I have tried to sort a Dictionary object by value which is generic.
Here is my code
Dictionary<string, ReportModel> sortedDic = new Dictionary<string, ReportModel>();
Dictionary<string, ReportModel> rDic = new Dictionary<string, ReportModel>();
var ordered = sortedDic.OrderByDescending(x => x.Value.totalPurchase);
foreach (var item in ordered)
{
rDic.Add(item.Key, item.Value);
}
The variable, ordered, just has the same order like sortedDic.
What is wrong with this?
Any idea?
This happens because Dictionary is generally an unordered container*. When you put the data into rDic, it becomes unordered again.
To retain the desired order, you need to put the results into a container that explicitly keeps the ordering that you supply. For example, you could use a list of KeyValuePair<string,ReportModel>, like this:
IList<KeyValuePair<string,ReportModel>> ordered = sortedDic
.OrderByDescending(x => x.Value.totalPurchase)
.ToList();
* Due to the way the Dictionary<K,V> is implemented by Microsoft, it happens to retain the insertion order, but that is incidental and undocumented, so it may change in the future versions, and should not be relied upon.
When adding the items back to the dictionary, it would not keep their order.
You can either:
Use the following implementation.
Use a list in the below form.
IEnumrable> lst=
sortedDic.OrderByDescending(x => x.Value.totalPurchase).ToArray();
[EDIT] If you don't mind the key changing then you can use SortedDictionary<,>.

Whats the best collection to use for uniquely identifying nodes?

Currently I am using a Dictionary<int,node> to store around 10,000 nodes. The key is used as an ID number for later look up and the 'node' is a class that contains some data. Other classes within the program use the ID number as a pointer to the node. (this may sound inefficient. However, explaining my reasoning for using a dictionary for this is beyond the scope of my question.)
However, 20% of the nodes are duplicate.
What i want to do is when i add a node check to see if it all ready exists. if it does then use that ID number. If not create a new one.
This is my current solution to the problem:
public class nodeDictionary
{
Dictionary<int, node> dict = new Dictionary<int, node>( );
public int addNewNode( latLng ll )
{
node n = new node( ll );
if ( dict.ContainsValue( n ) )
{
foreach ( KeyValuePair<int, node> kv in dict )
{
if ( kv.Value == n )
{
return kv.Key;
}
}
}
else
{
if ( dict.Count != 0 )
{
dict.Add( dict.Last( ).Key + 1, n );
return dict.Last( ).Key + 1;
}
else
{
dict.Add( 0, n );
return 0;
}
}
throw new Exception( );
}//end add new node
}
The problem with this is when trying to add a new node to a list of 100,000 nodes it takes 78 milliseconds to add the node. This is unacceptable because i could be adding an additional 1,000 nodes at any given time.
So, is there a better way do do this? I am not looking for someone to write the code for me, I am just looking for guidance.
It sounds like you want to
make sure that LatLng overrides Equals/GetHashCode (preferrably implement the IEquatable<LatLng> interface)
stuff all the items directly into a HashSet<LatLng>
For implementing GetHashCode, see here: Why is it important to override GetHashCode when Equals method is overridden?
If you need to generate 'artificial' unique IDs in some fashion, I suggest you use the dictionary approach again, but 'in reverse':
// uses the same hash function for speedy lookup/insertion
IDictionary<LatLng, int> idMap = new Dictionary<LatLng, int>();
foreach (LatLng latLng in LatLngCoords)
{
if (!idMap.ContainsKey(latLng))
idMap.Add(latLng, idMap.Count+1); // to start with 1
}
You can have the idMap replace the HashSet<>; the implementation (and performance characteristics) is essentially the same but as an associative container.
Here's a lookup function to get from LatLng to Id:
int IdLookup(LatLng latLng)
{
int id;
if (idMap.TryGetValue(latLng, id))
return id;
throw new InvalidArgumentException("Coordinate not in idMap");
}
You could just-in-time add it:
int IdFor(LatLng latLng)
{
int id;
if (idMap.TryGetValue(latLng, id))
return id;
id = idMap.Count+1;
idMap.Add(latLng, id);
return id;
}
I'd add a second dictionary for the reverse direction. i.e. Dictionary<Node,int>
Then you either
Are content with reference equality and do nothing.
Create an IEqualityComparer<Node> and supply it to the dictionary
Override Equals and GetHashCode on Node
In both cases a good implementation for the hashcode is essential to get good performance.
Your solution is not only slow, but also wrong. The order of items in a Dictionary is undefined, so dict.Last() is not guaranteed to return the item that was added last. (Although it may often look that way.)
Using an id to identify an object in your application seems wrong too. You should consider using references to the object directly.
But if you want to use your current design and assuming that you compare nodes based on their latLng, you could create two dictionaries: the one you already have and a second one, Dictionary<latLng, int>, that can be used to efficiently fond out whether a certain node already exists. And if it does, it gives you its id.
What exactly is the purpose of this code?
if ( dict.ContainsValue( n ) )
{
foreach ( KeyValuePair kv in dict )
{
if ( kv.Value == n )
{
return kv.Key;
}
}
}
The ContainsValue searches for a value (instead of a key) and is very inefficient (O(n)). Ditto for foreach. Let alone you do both when only one is necessary (you could completely remove ContainsValue by rearranging your ifs a little)!
You should probably mainntain additional dictionary that is "reverse" of the original one (i.e. values in old dictionary are keys in the new one and vice versa), to "cover" your search patterns (similarly to how databases can maintain multiple indexes par table to cover multiple ways table can be queried).
You could try using HashSet<T>
You might want to consider restructuring this to just use a List (where the 'key' is just the index into the List) instead of a Dictionary. A few advantages:
Looking up an element by integer key is now O(1) (and a very fast O(1) given that it's just an array dereference internally).
When you insert a new element, you perform an O(n) search to see whether it already exists in the list. If it does not, you've also already traversed the list and can have recorded whether you encountered an entry with a null record. If you have, that index is the new key. If not, the new key is the current list Count. You're enumerating the collection once instead of multiple times and the enumeration itself is much faster than enumerating a Dictionary.

C# - Removing Items from Dictionary in while loop

I have this and all seems to work fine but not sure why and if its valid.
Dictionary<string, List<string>> test = new Dictionary<string, List<string>>();
while (test.Count > 0)
{
var obj = test.Last();
MyMethod(obj);
test.Remove(obj.Key);
}
Update: Thanks for the answers, I have updated my code to explain why I don't do Dictionary.Clear();
I don't understand why you are trying to process all Dictonary entries in reverse order - but your code is OK.
It might be a bit faster to get a list of all Keys and process the entries by key instead of counting again and again...
E.G.:
var keys = test.Keys.OrderByDescending(o => o).ToList();
foreach (var key in keys)
{
var obj = test[key];
MyMethod(obj);
test.Remove(key);
}
Dictonarys are fast when they are accessed by their key value. Last() is slower and counting is not necessary - you can get a list of all (unique) keys.
There is nothing wrong with mutating a collection type in a while loop in this manner. Where you get into trouble is when you mutate a collection during a foreach block. Or more generally use a IEnumerator<T> after the underlying collection is mutated.
Although in this sample it would be a lot simpler to just call test.Clear() :)
That works, fine, since you're not iterating over the dictionary while removing items. Each time you check test.Count, it's like it's checking it from scratch.
That being said, the above code could be written much simpler and more effectively:
test.Clear();
It works because Count will be updated every time you remove an object. So say count is 3, test.Remove will decriment the count to 2, and so on, until the count is 0, then you will break out of the loop
Yes, this should be valid, but why not just call Dictionary.Clear()?
All you're doing is taking the last item in the collection and removing it until there are no more items left in the Dictionary.
Nothing out of the ordinary and there's no reason it shouldn't work (as long as emptying the collection is what you want to do).
So, you're just trying to clear the Dictionary, correct? Couldn't you just do the following?
Dictionary<string, List<string>> test = new Dictionary<string, List<string>>();
test.Clear();
This seems like it will work, but it looks extremely expensive. This would be a problem if you were iterating over it with a foreach loop (you can't edit collections while your iterating).
Dictionary.Clear() should do the trick (but you probably already knew that).
Despite your update, you can probably still use clear...
foreach(var item in test) {
MyMethod(item);
}
test.Clear()
Your call to .Last() is going to be extremely inefficient on a large dictionary, and won't guarantee any particular ordering of the processing regardless (the Dictionary is an unordered collection)
I used this code to remove items conditionally.
var dict = new Dictionary<String, float>
var keys = new String[dict.Count];
dict.Keys.CopyTo(keys, 0);
foreach (var key in keys) {
var v = dict[key];
if (condition) {
dict.Remove(key);
}

Sort Hashtable by (possibly non-unique) values

I have a Hashtable that maps strings to ints. Strings are unique, but several may be mapped to the same integer.
My naive approach was to simply invert the Hashtable to a SortedList that is indexed by the Hashtable's values, but the problem is that you get a clash as soon as two of the Hashtable's strings map to the same value.
What is the most efficient way to list my entire Hashtable (keys and values) ordered by the values? (Where two values are the same, I don't care about their ordering.)
Using Linq:
hashtable.Cast<DictionaryEntry>().OrderBy(entry => entry.Value).ToList()
You said you wanted the most efficient method. The following code is the best I could find.
Hashtable hashtable = GetYourHashtable();
var result = new List<DictionaryEntry>(hashtable.Count);
foreach (DictionaryEntry entry in hashtable)
{
result.Add(entry);
}
result.Sort(
(x, y) =>
{
IComparable comparable = x.Value as IComparable;
if (comparable != null)
{
return comparable.CompareTo(y.Value);
}
return 0;
});
foreach (DictionaryEntry entry in result)
{
Console.WriteLine(entry.Key.ToString() + ":" + entry.Value.ToString());
}
I experimented with various different approaches using Linq, but the above method was about 25-50% faster.
Maybe this could work:
myhashtable.Keys.Select(k => new List<string, int>() {k, myhashtable[k]})
.OrderBy(item => item[1]);
This should give you a list of lists, with the nested lists containing exactly two elements, the key and the value. Sorted by the value (second element).
I'm not quite sure if the Hashtable has a KeyValuePair<K, V> type... something like this could also work:
myhashtable.Items.OrderBy(kvp => kvp.Value);
The immediate way that springs to mind is along the lines of what you have except that you have a SortedList (or similar) that uses the original values (ie the integers) as keys and as values has a list of the original keys (ie the strings if I understand correctly). There is a bit more faff involved in adding values (since you need to check if they exist and add them to the list if so or create a new list otherwise). There may be better methods but this is the one that immediately springs to mind...

Categories

Resources