Dictionary object performance - c#

I have a possible implementation scenario where I need a dictionary object that will take 3 variables. A dialect, a query name and a query string. I should note at this stage that writing a separate class object is not an option.
My question is which of the following would perform better.
A) A single dictionary object that takes the first two variables in as a composite key e.g. "dialect,queryname" and the 3rd variable as the value.
private Dictionary<string, string>
B) A dictionary object that has another dictionary object as the value so the first variable would be the key of the primary dictionary object, the 2nd variable would be the key of the 2nd dictionary object and finally the 3rd variable would be the value of the second dictionary object.
private Dictionary<string, Dictionary<string, string>>
Seems obvious but the compiler is a mysterious thing so thought I should ask you guys.
Thanks

Just mucking around for my own amusement ..
Dictionary<string, string> md1 = new Dictionary<string,string>();
Dictionary<string, Dictionary<string, string>> md2 = new Dictionary<string, Dictionary<string, string>>();
Stopwatch st = new Stopwatch();
st.Start();
for (int i = 0; i < 2000000; i++)
{
md1.Add(i.ToString(), "blabla");
}
st.Stop();
Console.WriteLine(st.ElapsedMilliseconds);
st.Reset();
st.Start();
for (int i = 0; i < 2000000; i++)
{
md2.Add(i.ToString(), new Dictionary<string, string>());
}
st.Stop();
Console.WriteLine(st.ElapsedMilliseconds);
Console.ReadLine();
output:
831
1399

As long as you're sure that the key "dialect,queryname" is unique, I think the first solution is faster. In the second one, you'd have to do one more dictionary lookup, which is probably more costly than a string concatenation.

Why don't you use:
Dictionary<string, KeyValuePair<string, string>>
I think is better than both.

This is not a matter of performance, as the two have completely different semantics.
The first gives you a way to use one object to find another object.
The second gives you a way to use one object to find another object, in which you can use yet another object to find a third object.
There is slightly different functionality in terms of how these can be later extended.
Most generally, I'd use Dictionary<Tuple<string, string>, string>. This would give me a composite key that is clearly a composite key.
Actually, that's not true, I'd create a new class. How is that not an option? Still, if it was homework and "do not create a new class" was part of the question, I'd use Dictionary<Tuple<string, string>, string>.
Edit:
class DialectQuery : IEquatable<DialectQuery>
{
public Dialect{get;private set}
public Name{get;private set;}
public DialectQuery(string dialect, string name)
{
Dialect = dialect;
Name = name;
}
public bool Equals(DialectQuery other)
{
return other != null && Name == other.Name && Dialect == other.Dialect;
}
public override bool Equals(object other)
{
return Equals((object)other);
}
public override int GetHashCode()
{
int dHash = Dialect.GetHashCode();
return (dHash << 16 | dHash >> 16) ^ Name.GetHashCode();
}
}
So far it behaves exactly the same as Tuple. Now though if I get a change request that dialects must be case-insensitive but query names case-sensitive, or that dialects are codes and therefore require invariant comparison but names are human-input and therefore require culture-aware comparison, or anything else, I've got two simple changes to make.
YAGNI doesn't apply, it's not coding a massive object "in case you need it", it's defining a good "okay, I probably don't need it, but if I do it'll go here" point.

Related

Handle Collision using Hashtable Class in c#

In the below scenario how can I handle or implement collision in C# using the Hashtable class? If the 'Key' value is same I am getting an "Argument Exception".
static void Main(string[] args)
{
Console.Write("Enter a string:");
string input = Console.ReadLine();
checkString(input);
Console.ReadLine();
}
static void checkString(string input)
{
Hashtable hashTbl = new Hashtable();
foreach(char c in input)
{
hashTbl.Add(c.GetHashCode(), c);
}
printHash(hashTbl);
}
static void printHash(Hashtable hash)
{
foreach(int key in hash.Keys)
{
Console.WriteLine("Key: {0} Value: {1}",key,hash[key]);
}
}
My Expectation:
What do I need to do in the 'Value' argument to get around the 'Collision' issue. I am trying to check if the string consists of unique characters.
It seems you are misunderstanding how the Hashtable class works (and it has been deprecated since 2005 - use Dictionary<K,V> instead, but its behavior here is identical).
It seems you're expecting it to be your job to get an object's hashcode and add it to the hashtable. It isn't. All you need to do is add the object you want to use as key (each character), and the internal implementation will extract the hashcode.
However, what you're actually doing won't work even if you added the key object yourself. You're taking an input string (say, "test"), and for each character, you're adding it to the hashtable as a key. But since keys are, by definition, unique, you'll be adding the character 't' twice (it shows up twice in the input), so you'll get an exception.
I am trying to check if the string consists of unique characters.
Then you need keys only without values, that's what HashSet<T> is for.
var chars = new HashSet<char>();
foreach (char c in input)
{
if (chars.Contains(c))
{
// c is not unique
}
else
{
chars.Add(c);
}
}
But I'd prefer usin LINQ in this case:
var hasUniqueChars = input.Length == input.Distinct().Count();
As previously stated you should probably switch to the Dictionary<TKey, TValue> class for this.
If you want to get around the collission issue, then you have to check the key for existence.
Dictionary<string, object> dictValues = new Dictionary<string, object>();
Then you can use check for collission:
if (dictValues.ContainsKey(YourKey))
{
/* ... your collission handling here ... */
}
else
{
// No collission
}
Another possibility would be, if you are not interested in preserving previous values for the same key:
dictValues[YourKey] = YourValue;
This will add the key entry if it is not there already. If it is, it will overwrite its value with the given input.

How to get the last value entered [duplicate]

My dictionary:
Dictionary<double, string> dic = new Dictionary<double, string>();
How can I return the last element in my dictionary?
What do you mean by Last? Do you mean Last value added?
The Dictionary<TKey,TValue> class is an unordered collection. Adding and removing items can change what is considered to be the first and last element. Hence there is no way to get the Last element added.
There is an ordered dictionary class available in the form of SortedDictionary<TKey,TValue>. But this will be ordered based on comparison of the keys and not the order in which values were added.
EDIT
Several people have mentioned using the following LINQ style approach
var last = dictionary.Values.Last();
Be very wary about using this method. It will return the last value in the Values collection. This may or may not be the last value you added to the Dictionary. It's probably as likely to not be as it is to be.
Dictionaries are unordered collections - as such, there is no concept of a first or last element. If you are looking for a class that behaves like a dictionary but maintains the insertion order of items, consider using OrderedDictionary.
If you are looking for a collection that sorts the items, consider using SortedDictionary<TKey,TValue>.
If you have an existing dictionary, and you are looking for the 'last' element given some sort order, you could use linq to sort the collection, something like:
myDictionary.Values.OrderBy( x => x.Key ).Last();
By wary of using Dictionary.Keys.Last() - while the key list is sorted using the default IComparer for the type of the key, the value you get may not be the value you expect.
I know this question is too old to get any upvotes, but I didn't like any of the answers so will post my own in the hopes of offering another option to future readers.
Assuming you want the highest key value in a dictionary, not the last inserted:
The following did not work for me on .NET 4.0:
myDictionary.Values.OrderBy( x => x.Key ).Last();
I suspect the problem is that the 'x' represents a value in the dictionary, and a value has no key (the dictionary stores the key, the dictionary values do not). I may also be making a mistake in my usage of the technique.
Either way, this solution would be slow for large dictionaries, probably O(n log n) for CS folks, because it is sorting the entire dictionary just to get one entry. That's like rearranging your entire DVD collection just to find one specific movie.
var lastDicVal = dic.Values.Last();
is well established as a bad idea. In practice, this solution may return the last value added to the dictionary (not the highest key value), but in software engineering terms that is meaningless and should not be relied upon. Even if it works every time for the rest of eternity, it represents a time bomb in your code that depends on library implementation detail.
My solution is as follows:
var lastValue = dic[dic.Keys.Max()];
The Keys.max() function is much faster than sorting O(n) instead of O(n log n).
If performance is important enough that even O(n) is too slow, the last inserted key can be tracked in a separate variable used to replace dic.Keys.Max(), which will make the entire lookup as fast as it can be, or O(1).
Note: Use of double or float as a key is not best practice and can yield surprising results which are beyond the scope of this post. Read about "epsilon" in the context of float/double values.
If you're using .NET 3.5, look at:
dic.Keys.Last()
If you want a predictable order, though, use:
IDictionary<int, string> dic = new SortedDictionary<int, string>();
Instead of using:
Dictionary<double, string>
...you could use:
List<KeyValuePair<double, string>>
This would allow you to use the indexer to access the element by order instead of by key.
Consider creating a custom collection that contains a reference in the Add method of the custom collection. This would set a private field containing the last added key/value(or both) depending on your requirements.
Then have a Last() method that returns this. Here's a proof of concept class to show what I mean (please don't knock the lack of interface implementation etc- it is sample code):
public class LastDictionary<TKey, TValue>
{
private Dictionary<TKey, TValue> dict;
public LastDictionary()
{
dict = new Dictionary<TKey, TValue>();
}
public void Add(TKey key, TValue value)
{
LastKey = key;
LastValue = value;
dict.Add(key, value);
}
public TKey LastKey
{
get; private set;
}
public TValue LastValue
{
get; private set;
}
}
From the docs:
For purposes of enumeration, each item
in the dictionary is treated as a
KeyValuePair structure representing a
value and its key. The order in which
the items are returned is undefined.
So, I don't think you can rely on Dictionary to return the last element.
Use another collection. Maybe SortedDictionary ...
If you just want the value, this should work (assuming you can use LINQ):
dic.Values.Last()
You could use:
dic.Last()
But a dictionary doesn't really have a last element (the pairs inside aren't ordered in any particular way). The last item will always be the same, but it's not obvious which element it might be.
With .Net 3.5:
string lastItem = dic.Values.Last()
string lastKey = dic.Keys.Last()
...but keep in mind that a dictionary is not ordered, so you can't count on the fact that the values will remain in the same order.
A dictionary isn't meant to be accessed in order, so first, last have no meaning. Do you want the value indexed by the highest key?
Dictionary<double, string> dic = new Dictionary<double, string>();
double highest = double.MinValue;
string result = null;
foreach(double d in dic.keys)
{
if(d > highest)
{
result = dic[d];
highest = d;
}
}
Instead of using Linq like most of the other answers suggest, you can just access the last element of any Collection object via the Count property (see ICollection.Count Property for more information).
See the code here for an example of how to use count to access the final element in any Collection (including a Dictionary):
Dictionary<double, string> dic = new Dictionary<double, string>();
var lastElementIndex = dic.Count - 1;
var lastElement = dic[lastElementIndex];
Keep in mind that this returns the last VALUE, not the key.

When to write code to anticipate looping and when to specifically do something 'X' times

If i know that a method must perform an action a certain amount of times, such as retrieve data, should I write code to specifically do this the number of times that is required, or should my code be able to anticipate later changes?
For instance, say I was told to write a method that retrieves 2 values from a dictionary (Ill call it Settings here) and return them using known keys that are provided
public Dictionary<string, string> GetSettings()
{
const string keyA = "address"; //I understand 'magic strings' are bad, bear with me
const string keyB = "time"
Dictionary<string, string> retrievedSettings = new Dictionary<string,string>();
//should I add the keys to a list and then iterate through the list?
List<string> listOfKeys = new List<string>(){keyA, keyB};
foreach( string key in listOfKeys)
{
if(Settings.ContainsKey(key)
{
string value = Setting[key];
retrieveSettings.Add(key, value);
}
}
//or should I just get the two values directly from the dictionary like so
if(Settings.ContainsKey(keyA)
{
retrievedSettings.Add(keyA , Setting[keyA]);
}
if(Settings.Contains(keyB)
{
retrievedSettings.Add(keyB , Setting[keyB]);
}
return retrievedSettings
}
The reason why I ask is that code repetition is always a bad thing ie DRY, but at the same time, more experienced programmers have told me that there is no need write the logic to anticipate larger looping if it the action only needs to be performed a known number of times
I would extract a method that takes the keys as parameter:
private Dictionary<string, string> GetSettings(params string[] keys)
{
var retrievedSettings = new Dictionary<string, string>();
foreach(string key in keys)
{
if(Settings.ContainsKey(key)
retrieveSettings.Add(key, Setting[key]);
}
return retrievedSettings;
}
You can now use this method like this:
public Dictionary<string, string> GetSettings()
{
return GetSettings(keyA, keyB);
}
I would choose this approach because it makes your main method trivial to understand: "Aha, it gets the settings for keyA and for keyB".
I would use this approach even when I am sure that I will never need to get more than these two keys. In other words, this approach has been chosen, not because it anticipates later changes but because it better communicates intent.
However, with LINQ, you wouldn't really need that extracted method. You could simply use this:
public Dictionary<string, string> GetSettings()
{
return new [] { keyA, keyB }.Where(x => Settings.ContainsKey(x))
.ToDictionary(x => x, Settings[x]);
}
The DRY principle does not necessarily mean that every line of code in your program should be unique. It simply means that you should not have large regions of code spread out throughout your program that do the same thing.
Option number 1 works well when you have a large number of items to search for, but has the downside of making the code slightly less trivial to read.
Option number 2 works well when you have a small number options. It is more straightforward and is actually more efficient.
Since you only have two settings, I would definitely go with option number 2. Making decisions such as these in expectation of future changes is a waste of effort. I have found this article to be quite helpful in illustrating the perils of being too concerned with non-existent requirements.

Get A random keyValue from Hashtable

I Have a Hashtable that I dont know What is the content of .
now I want to get one Key and value from it;
I use hashtable because of its speed because content of hashtable is over 4,500,000 KeyValuePair so I cant use GetEnumerator its reduce program speed
You use a List<TKey>:
Dictionary<string, string> dict = ... your hashtable which could be huge
List<string> keys = new List<string>(dict.Keys);
int size = dict.Count;
Random rand = new Random();
string randomKey = keys[rand.Next(size)];
We are just creating a List<TKey> whose elements are pointing to the same location in memory as the keys of your hashtable and then we pick a random element from this list.
And if you want to get a random element value from the hashtable, this should be pretty straightforward given a random key.
string randomeElement = dict[randomKey];
I cant use GetEnumerator its reduce program speed"
Well that's a problem. You've accepted an answer which does iterate over all the entries, and also copies the keys into a new list, so it's not clear whether you've abandoned that requirement.
An approach which will certainly be more efficient in memory and potentially in speed as well is to iterate over the whole dictionary, but retaining a random element at any one time, with an optimization for collections where we can obtain the count cheaply. Here's an extension method which will do that for any generic sequence in .NET:
public static T RandomElement<T>(this IEnumerable<T> source,
Random rng)
{
// Optimize for the "known count" case.
ICollection<T> collection = source as ICollection<T>;
if (collection != null)
{
// ElementAt will optimize further for the IList<T> case
return source.ElementAt(rng.Next(collection.Count));
}
T current = default(T);
int count = 0;
foreach (T element in source)
{
count++;
if (rng.Next(count) == 0)
{
current = element;
}
}
if (count == 0)
{
throw new InvalidOperationException("Sequence was empty");
}
return current;
}
So for a Dictionary<TKey, TValue> you'd end up with a KeyValuePair<TKey, TValue> that way - or you could project to Keys first:
var key = dictionary.Keys.RandomElement(rng);
(See my article on Random for gotchas around that side of things.)
I don't believe you'll be able to do any better than O(n) if you want a genuinely pseudo-random key, rather than just an arbitrary key (which you could get by taking the first one in the sequence, as stated elsewhere).
Note that copying the keys to a list as in Darin's answer allows you to get multiple random elements more efficiently, of course. It all depends on your requirements.
How random does the random key have to be?
Hash tables don't define an order for their items to be stored in, so you could just grab the first item. It's not really random, but it's not insertion order or sorted order either. Would that be random enough?
Dictionary<string, string> dict = GetYourHugeHashTable();
KeyValuePair<string, string> randomItem = dict.First();
DoAComputation(randomItem.Key, randomItem.Value);
dict.Remove(randomItem.Key);
with Linq you can do:
Dictionary<string, string> dicto = new Dictionary<string, string>();
Random rand = new Random();
int size = dicto.Count;
int randNum = rand.Next(0, size);
KeyValuePair<string, string> randomPair = dicto.ElementAt( randNum );
string randomVal = randomPair.Value;
For instance,
string tmp = dicto.ElementAt( 30 ).Value;
Would copy the value of the thirtieth item in the Dicto to the string tmp.
Internally, I think it walks through the keypairs one at a time, till it gets to the thirtieth, instead of copying them all, so you don't need to load all the elements into memory.
I'm not sure what you meant by not knowing what the content is.
You don't know the types in the KeyValuePair of the dicto?
Or just don't know what values will be in the dicto?
Hashtable.Keys will give you a pointer to the internal list of keys. That is speedy. Also removing an item from a Hashtable is an O(1) operation, so this will also be speedy, even with large amounts of items.
You could do a loop like this (I see no reason to use random in your question);
var k = Hashtable.Keys(); // Will reflect actual contents, even if changes occur
while (k.Count > 0 )
{
var i = Keys.First();
{
Process(i);
Hashtable.Remove(i)
}
}
Well, if you know which version of the .NET BCL you'll be targeting (i.e., if it's fixed), you could always plumb the internals of Dictionary<TKey, TValue> to figure out how it stores its keys privately and use that to pluck a random one.
For example, using the version of Mono I currently have installed on my work laptop, I see that the Dictionary<TKey, TValue> type has a private field called keySlots (I assume this will be different for you if you're on Windows). Using this knowledge you could implement a function looking something like this:
static readonly Dictionary<Type, FieldInfo> KeySlotsFields = new Dictionary<Type, FieldInfo>();
public static KeyValuePair<TKey, TValue> GetRandomKeyValuePair<TKey, TValue>(this Random random, Dictionary<TKey, TValue> dictionary, Random random = null)
{
// Here's where you'd get the FieldInfo that you've identified
// for your target version of the BCL.
FieldInfo keySlotsField = GetKeySlotsField<TKey, TValue>();
var keySlots = (TKey[])keySlotsField.GetValue(dictionary);
var key = (TKey)keySlots[random.Next(keySlots.Length)];
// The keySlots field references an array with some empty slots,
// so we need to loop until we come across an existing key.
while (key == null)
{
key = (TKey)keySlots[random.Next(keySlots.Length)];
}
return new KeyValuePair<TKey, TValue>(key, dictionary[key]);
}
// This happens to work for me on Mono; you'd almost certainly need to
// rewrite it for different platforms.
public FieldInfo GetKeySlotsField<TKey, TValue>()
{
Type keyType = typeof(TKey);
FieldInfo keySlotsField;
if (!KeySlotsFields.TryGetValue(keyType, out keySlotsField))
{
KeySlotsFields[keyType] = keySlotsField = typeof(Dictionary<TKey, TValue>).GetField("keySlots", BindingFlags.Instance | BindingFlags.NonPublic);
}
return keySlotsField;
}
This could be an appropriate solution in your case, or it could be a horrible idea. Only you have enough context to make that call.
As for the example method above: I personally like adding extension methods to the Random class for any functionality involving randomness. That's just my choice; obviously you could go a different route.

Generic way to send an array collection containing only a part of a more complex structure

Let's say a program like this:
class MyClass
{
public int Numbers;
public char Letters;
}
class Program
{
static void Main()
{
var mc = new MyClass[5];
for (var i = 0; i < 5; i++)
{
mc[i].Numbers = i + 1;
mc[i].Letters = (char) (i + 65);
}
}
}
Now, let's suppose an 'X' method that requires ALL the numbers contained in the object mc, in a separate array, that's sent as a parameter.
My first idea is a for, a new integers array, and copy one by one onto its respective position. But, what if the MyClass gets different, now it has strings and floats, and I wanna pull out the strings, now the for has to be completely redefined in its inside part to create the needed array for another 'X' method.
I know of cases where Linq helps a lot, for example, generics for Sum, Average, Count and another numeric functions, and of course, it's combination with lambda expressions.
I'm wondering if something similar exists to make the above arrays of MyClass (and anothers of course) in a faster-generic way?
If you want to use LINQ, you can do something like the following:
int [] numbers = mc.Select<MyClass, int>(m => mc.Number).ToArray();
To make it more generic than that, it gets a bit more complicated, and you may need reflection, or dynamic objects. A simple example with reflection would be:
private TValue[] ExtractFields<TClass, TValue>(TClass[] classObjs, string fieldName)
{
FieldInfo fInfo = typeof(TClass).GetField(fieldName, BindingFlags.Public | BindingFlags.Instance);
if (fInfo != null && fInfo.FieldType.Equals(typeof(TValue)))
return classObjs.Select<TClass, TValue>(c => (TValue)fInfo.GetValue(c)).ToArray();
else
throw new NotSupportedException("Unidentified field, or different field type");
}
And then just call it like:
int [] fields = ExtractField<MyClass, int>(mc, "Number");
If you are using C# 4.0, then you may use dynamic
class MyClass
{
public dynamic Numbers;
public char Letters;
}
EDIT: based on comments
I am not sure if this is what you want:
int[] arr = mc.Select(a => a.Numbers).ToArray<int>();
or without casting
int[] arr = mc.Select(a => a.Numbers).ToArray();
Why not just use Dictionary<int, char>, or if the data type is unknown then simply Dictionary<object, object>
If your goal is to generate a new array which is detached from the original array, but contains data copied from it, the most generic thing you could do would be to define a method like:
T my_array[]; // The array which holds the real things
U[] CopyAsConvertedArray<U>(Func<T,U> ConversionMethod);
That would allow one to generate a new array which extracts items from the original using any desired method.

Categories

Resources