What is the optimal data structure for storing objects with a string key and a bool auxiliary value? - c#

I need a data structure like below, but I need to be able to change the bool value. Other two stay the as they were when they were initialized. What would you use for best performance?
Dictionary<string, (object, bool)> dic = new Dictionary<string, (object, bool)>();
I was thinking of hashtable. But hashtable is like a dictionary with key/value. The object and bool in my example are in concept not like a key/value, because other values of the external dictionary can have the same object (or better yet ... object type). I don't want to make someone looking at my code later on thinking that the object and bool are more related they really are.
EDIT: object in this example is just a place holder. In reality it's a complex object with other objects in it and so on. Procedure before this one makes a bunch of this objects and some of them are deepcopy of the others. They are passed to this procedure. All of the object are here named by some rules and stored in the dictionary. Names are obviously unique. Procedure that comes after will take this dictionary and set the bool value on and off based on the values in the objects themselves and on the values of other bools. Procedure will be recursive until some state is reached.
Number of objects (or dic. entries) is arbitrary but expected to be >100 && <500. Time complexity is O(n).
I am targeting .NET7 (standard).

but I need to be able to change the bool value.
You can just reassign value for the key:
var tuples = new Dictionary<string, (object Obj, bool Bool)>
{
{ "1", (new object(), true) }
};
tuples["1"] = (tuples["1"].Obj, false); // or tuples["1"] = (tuples["1"].Item1, false);
Or
if (tuples.TryGetValue("1", out var c))
{
tuples["1"] = (c.Obj, false);
}
Personally I would leave it at that, but for really high perf scenarios you can look into CollectionMarshall instead of second snippet:
ref var v = ref CollectionsMarshal.GetValueRefOrNullRef(tuples, "1");
if (!Unsafe.IsNullRef(ref v))
{
v.Bool = false;
}
A bit more info - here.

For the 'performance' aspect:
The .NET Dictionary uses hashes to look up the item you need, which is very fast (comparable to a HashTable). I don't expect much performance issues related to this, or at least nothing that can be improved on with other data structures.
Also, you shouldn't worry about performance unless you are doing things a million times in a row + it turns out (in practice) that something is taking a measurable amount of time.
For the 'changing a bool' aspect:
... that is quite a long story.
There are 2 tuple variants in .NET:
The value tuple, created by doing var x = (myObj, myBool), like you are doing.
The x is a struct, and therefore a Value Type. You can actually change x.Item1 or x.Item2 to a new value just fine.
However... if you put x into a Dictionary then you actually put a copy of x (with a copy of its values) into the dictionary, because that is the nature of value types.
When you retrieve it again from the Dictionary, yet another copy is made - which makes modifying the actual tuple inside the Dictionary impossible; any attempt to do so would only modify the last copy you got.
Side story: The .NET Compiler knows this, which is why its refuses to compile code like dic[yourKey].Item2 = newBool; because such code wouldn't do what you might hope it would do. You're basically telling the compiler to create a copy, modify the copy, and then... discard the copy. The compiler requries a variable to store the copy before the rest can even start, but we provided no variable.
The Tuple generic class, or rather a range of generic classes, an instance of which can be created using calls like var x = Tuple.Create(myObj, myBool). These classes however forbid that you change any of their properties, they are always readonly. Tuple class instances can be put in a Dictionary, but they will still be readonly.
So what options are there really to 'modify a value in a tuple' a Dictionary?
Keep using a value tuple, but accept that in order to "change" the tuple inside the Dictionary you'll have to make a new instance (either a copy, or from scratch), set it to the properties that you want, and put that instance (or actualy a copy...) into the dictionary:
// initialize it
var dict = new Dictionary<string, (object, bool)>();
var obj = new object();
dict["abc"] = (obj, true);
// change it
var tmpTuple = dict["abc"]; // get copy
tmpTuple.Item2 = false; // alter copy
dict["abc"] = tmpTuple; // store another copy
// or if you want to avoid the tmp variable
dict["abc"] = (dict["abc"].Item1, false)
Use a custom class instead of the value tuple or a Tuple class, and then put that into the Dictionary:
public class MyPair
{
public object O { get; set; }
public bool B { get; set; }
}
// initialize it
var dict = new Dictionary<string, MyPair>();
var obj = new object();
dict["abc"] = new MyPair { O = obj, B = true };
// change it
dict["abc"].B = false;
So both types of Tuples are OK for objects that you don't want to do a lot with. But both have certain limits in their usage, and sooner or later you may need to start using classes.

Related

SortedSet Enumeration not working as expected

Lets say ClassA is a random class and I am creating SortedSet of ClassA. I am storing the enumeration in dictionary but whenever I try to access then it always gives null;
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, SortedSet<ClassA>.Enumerator>();
map[1] = set.GetEnumerator();
map[1].MoveNext();
var val = map[1].Current;
//Why val is null ???
This happens because SortedSet<T>.Enumerator is a struct. Each time you use the dictionary's indexer to retrieve the enumerator, you get a new copy of it. So even though you call MoveNext() on that copy, the next time you get a copy of the enumerator, it does not have any value for Current.
Interestingly, because of a quirk in the exact implementation of that struct, each copy of the enumerator gets the same reference-type object to track the state of the enumeration (a stack), and so the MoveNext() method seems to work (i.e. it returns true the first time you call it, but false any subsequent time).
There are at least four options for handling this correctly…
Retrieve the copy into a variable, and use the variable instead of the dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, SortedSet<ClassA>.Enumerator>();
map[1] = set.GetEnumerator();
var e = map[1];
e.MoveNext();
val = e.Current;
Note that in this example, the dictionary's copy of the enumerator still will not have the Current value you want. You would have to set the dictionary's copy back again after calling MoveNext() to preserve that: map[1] = e;
Use an array instead of a dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new SortedSet<ClassA>.Enumerator[2];
map[1] = set.GetEnumerator();
map[1].MoveNext();
var val = map[1].Current;
Other than the difference in the declaration and initialization of map, this works exactly as you have the code now. This is because indexed elements of an array are variables, rather than going through an indexer as the indexing syntax would with any other collection. So you are operating directly on the copy of the enumerator stored in the array, rather than a fresh copy returned by an indexer.
Of course, this would only work if the key values were constrained enough to make it feasible to allocate an array large enough to hold all possibilities for the key.
Declare a reference-type wrapper to contain the value-type enumerator:
class E<T>
{
public SortedSet<T>.Enumerator Enumerator;
public E(SortedSet<T>.Enumerator enumerator)
{
Enumerator = enumerator;
}
}
then…
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, E<ClassA>>();
map[1] = new E<ClassA>(set.GetEnumerator());
map[1].Enumerator.MoveNext();
val = map[1].Enumerator.Current;
In this example, the dictionary is just returning the reference to the wrapper object, ensuring just a single copy of the enumerator (which is stored in that wrapper object rather than the dictionary). Thus every time you access the object through the dictionary, you get the same copy.
Of course, you wind up having to go through the Enumerator field of the wrapper. It's a bit clumsy. But it would work.
Store the enumerators in an array, but index through a dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, int>();
var a = new SortedSet<ClassA>.Enumerator[1];
map[1] = 0;
a[map[1]] = set.GetEnumerator();
a[map[1]].MoveNext();
val = a[map[1]].Current;
This blends the two previous options. The array is used to store the actual enumerators, so that you can address them as variables, but the dictionary is used as a level of indirection to the array, so that you can refer to the enumerators via whatever keys you like.
Obviously, in a real application you would initialize the array by enumerating the original collections, storing their enumerators in the array and mapping the original key to the array index in the dictionary as you go.
The extra indirection is a little clumsy, as in the wrapper option, but not quite as bad as that, and solves the array-size concern of the array-based option.
Arguably, your question is a duplicate of this question: List.All(e => e.MoveNext()) doesn't move my enumerators on
It is definitely closely related to that one, as well as this one: Details on what happens when a struct implements an interface

Setting dictionary key and value types with variables

I am currently struggling to create a dictionary. I want to create it so that it can be used in multiple situations. However, these situations vary from key and value types. So while you normally do:
Dictionary<int, string> Something = new Dictionary<int, string>();
I want to do something like:
Dictionary<variable1, variable2> ..............
Doesn't matter much what variable1 is. It can be a string, that stores 'string', or 'int' as value. I could also use variable1.getType() to determine the type. Either way would work for me. But the way I did above, well, that is just incorrect. There must be another way to set the key and value type based on variables... right?
Something just shoot into my head, to use if's to check what the type is, and based on the type make the dictionary use that type. But with the amount of types, it's going to be a lot of if's, and I feel like there has to be a better way.
Searching hasn't helped me much. Well I learned some other things, but no solution to my problem. In every single case, dictionary TKey and TValue has been set manually. While I want to set them, with a variable that I take from some source.
There must be another way to set the key and value type based on
variables... right?
Yes, there is. You can make a helper method that creates a dictionary, example:
public static Dictionary<K, V> CreateDictionaryFor<K, V>(K key, V value)
{
return new Dictionary<K, V>();
}
Then, you can use it with variable1 and variable2:
var dictionary = CreateDictionaryFor(variable1, variable2);
You can try doing Dictionary<object, object>.
That way you can pass whatever you need to pass and check the type as needed.
var dict = new Dictionary<object, object>();
dict.Add(45, "dkd");
A pssibility would be to capsulate the dictionary in a new class, and create the dictionary via a generic method:
public class GenericDictionary
{
private IDictionary m_dictionary;
public bool Add<TA, TB>(TA key, TB value)
{
try
{
if (m_dictionary == null)
{
m_dictionary = new Dictionary<TA, TB>();
}
//check types before adding, instead of using try/catch
m_dictionary.Add(key, value);
return true;
}
catch (Exception)
{
//wrong types were added to an existing dictionary
return false;
}
}
}
Of course the code above needs some improvements (no exception when adding wrong types, additional methods implementing the dictionary methods you need), but the idea should be clear.

Is there a way to make string a reference type in a collection?

I want to modify some strings that are contained in an object like say an array, or maybe the nodes in an XDocument (XText)XNode.Value.
I want to gather a subset of strings from these objects and modify them, but I don't know at runtime from what object type they come from.
Put another way, let's say I have objects like this:
List<string> fruits = new List<string>() {"apple", "banana", "cantelope"};
XDocument _xmlObject;
I want to be able to add a subset of values from the original collections to new lists like this:
List<ref string> myStrings1 = new List<ref string>();
myStrings1.Add(ref fruits[1]);
myStrings1.Add(ref fruits[2]);
List<ref string> myStrings2 = new List<ref string>();
IEnumerable<XNode> xTextNodes = getTargetTextNodes(targetPath); //some function returns a series of XNodes in the XDocument
foreach (XNode node in xTextNodes)
{
myStrings2.Add(((XText)node).Value);
}
Then change the values using a general purpose method like this:
public void Modify(List<ref string> mystrings){
foreach (ref string item in mystrings)
{
item = "new string";
}
}
Such that I can pass that method any string collection, and modify the strings in the original object without having to deal with the original object itself.
static void Main(string[] args)
{
Modify(myStrings1);
Modify(myStrings2);
}
The important part here is the mystrings collection. That can be special. But I need to be able to use a variety of different kinds of strings and string collections as the originals source data to go in that collection.
Of course, the above code doesn't work, and neither does any variation I've tried. Is this even possible in c#?
What you want is possible with C#... but only if you can fix every possible source for your strings. That would allow you to use pointers to the original strings... at a terrible cost, however, in terms of memory management and unsafe code throughout your application.
I encourage you to pursue a different direction for this.
Based on your edits, it looks like you're always working with an entire collection, and always modifying the entire collection at once. Also, this might not even be a string collection at the outset. I don't think you'll be able to get the exact result you want, because of the base XDocument type you're working with. But one possible direction to explore might look like this:
public IEnumerable<string> Modify(IEnumerable<string> items)
{
foreach(string item in items)
{
yield return "blah";
}
}
You can use a projection to get strings from any collection type, and get your modified text back:
fruits = Modify(fruits).ToList();
var nodes = Modify( xTextNodes.Select(n => (XText)n.Value));
And once you understand how to make a projection, you may find that the existing .Select() method already does everything you need.
What I really suggest, though, is that rather than working with an entire collection, think about working in terms of one record at a time. Create a common object type that all of your data sources understand. Create a projection from each data source into the common object type. Loop through each of the objects in your projection and make your adjustment. Then have another projection back to the original record type. This will not be the original collection. It will be a new collection. Write your new collection back to disk.
Used appropriately, this also has the potential for much greater performance than your original approach. This is because working with one record at a time, using these linq projections, opens the door to streaming the data, such that only one the one current record is ever held in memory at a time. You can open a stream from the original and a stream for the output, and write to the output just as fast as you can read from the original.
The easiest way to achieve this is by doing the looping outside of the method. This allows you to pass the strings by reference which will replace the existing reference with the new one (don't forget that strings are immutable).
And example of this:
void Main()
{
string[] arr = new[] {"lala", "lolo"};
arr.Dump();
for(var i = 0; i < arr.Length; i++)
{
ModifyStrings(ref arr[i]);
}
arr.Dump();
}
public void ModifyStrings(ref string item)
{
item = "blah";
}

SortedSet and SortedList fails with different enums

The whole story; I have some KeyValuePairs that I need to store in a session and my primary goal is to keep it small. Therefore I don't have the option of using many different collection. While the key is a different enum value of of a different enum type the value is always just a enum value of the same enum type. I have chosen a HashTable for this approach which content look like this (just many more):
// The Key-Value-Pairs
{ EnumTypA.ValueA1, MyEnum.ValueA },
{ EnumTypB.ValueB1, MyEnum.ValueB },
{ EnumTypC.ValueC1, MyEnum.ValueA },
{ EnumTypA.ValueA2, MyEnum.ValueC },
{ EnumTypB.ValueB1, MyEnum.ValueC }
At most I am running contains on that HashTable but for sure I also need to fetch the value at some point and I need to loop through all elements. That all works fine but now I have a new requirement to keep the order I have added them to the HashTable -> BANG
A HashTable is a map and that is not possible!
Now I thought about using a SortedList<object, MyEnum> or to go with more Data but slightly faster lookups and use a SortedSet<object> in addition to the HashTable.
Content below has been edited
The SortedList is implemented as
SortedList<Enum, MyEnum> mySortedList = new SortedList<Enum, MyEnum>();
the SortedSet is implemented as
SortedSet<Enum> mySortedSet = new SortedSet<Enum>();
The described Key - Value - Pairs are added to the sorted list with
void AddPair(Enum key, MyEnum value)
{
mySortedList.Add(key, value);
}
And for the SortedSett like this
void AddPair(Enum key)
{
mySortedSet.Add(key);
}
Both are failing with the exception:
Object must be the same type as the
enum
My question is: What goes wrong and how can I archive my goal?
Used Solution
I've decided to life with the downside
of redundant data against slower
lookups and decided to implement a
List<Enum> which will retain the
insert order parallel to my already
existing HashTable.
In my case I just have about 50-150
Elements so I decided to benchmark the
Hashtable against the
List<KeyValuePair<object,object>>
Therefore I have create me the
following helper to implement
ContainsKey() to the
List<KeyValuePair<object,object>>
static bool ContainsKey(this List<KeyValuePair<object, object>> list, object key)
{
foreach (KeyValuePair<object, object> p in list)
{
if (p.Key.Equals(key))
return true;
}
return false;
}
I inserted the same 100 Entries and
checked randomly for one of ten
different entries in a 300000 loop.
And... the difference was tiny so I
decided to go with the
List<KeyValuePair<object,object>>
I think you should store your data in an instance of List<KeyValuePair<Enum, MyEnum>> or Dictionary<Enum, MyEnum>.
SortedSet and SortedList are generic, but your keys are EnumTypeA/EnumTypeB, you need to specify the generic T with their base class(System.Enum) like:
SortedList<Enum, MyEnum> sorted = new SortedList<Enum, MyEnum>();
EDIT
Why you got this exception
SortedList and SortedSet use a comparer inside to check if two keys are equal. Comparer<Enum>.Default will be used as the comparer if you didn't specify the comparer in the constructor. Unfortunately Comparer<Enum>.Default isn't implemented as you expected. It throws the exception if the two enums are not the same type.
How to resolve the problem
If you don't want to use a List<KeyValuePair<Enum, MyEnum>> and insist using SortedLIst, you need to specify a comparer to the constructor like this:
class EnumComparer : IComparer<Enum>
{
public int Compare(Enum x, Enum y)
{
return x.GetHashCode() - y.GetHashCode();
}
}
var sorted = new SortedList<Enum, MyEnum>(new EnumComparer());
Btw, I think you need to obtain the "inserting order"? If so, List<KeyValuePair<K,V>> is a better choice, because SortedSet will prevent duplicated items.

C#: Easy access to the member of a singleton ICollection<>?

I have an ICollection that I know will only ever have one member. Currently, I loop through it, knowing the loop will only ever run once, to grab the value. Is there a cleaner way to do this?
I could alter the persistentState object to return single values, but that would complicate the rest of the interface. It's grabbing data from XML, and for the most part ICollections are appropriate.
// worldMapLinks ensured to be a singleton
ICollection<IDictionary<string, string>> worldMapLinks = persistentState.GetAllOfType("worldMapLink");
string levelName = ""; //worldMapLinks.GetEnumerator().Current['filePath'];
// this loop will only run once
foreach (IDictionary<string, string> dict in worldMapLinks) // hacky hack hack hack
{
levelName = dict["filePath"];
}
// proceed with levelName
loadLevel(levelName);
Here is another example of the same issue:
// meta will be a singleton
ICollection<IDictionary<string, string>> meta = persistentState.GetAllOfType("meta");
foreach (IDictionary<string, string> dict in meta) // this loop should only run once. HACKS.
{
currentLevelName = dict["name"];
currentLevelCaption = dict["teaserCaption"];
}
Yet another example:
private Vector2 startPositionOfKV(ICollection<IDictionary<string, string>> dicts)
{
Vector2 result = new Vector2();
foreach (IDictionary<string, string> dict in dicts) // this loop will only ever run once
{
result.X = Single.Parse(dict["x"]);
result.Y = Single.Parse(dict["y"]);
}
return result;
}
Why not use the Single or FirstOrDefault extension methods?
var levelName = worldMapLinks.Single().Value;
Single has the advantage of enforcing your assumption that there is only 1 value in the enumeration. If this is not true an exception will be raised forcing you to reconsider your logic. FirstOrDefault will return a default value if there is not at least 1 element in the enumeration.
If you can use LINQ-to-objects in your class, use the Single() extension method on the collection if you know there will be exactly one member. Otherwise, if there could be zero or one, use SingleOrDefault()
Why do you have a collection with only one member? It seems that the real answer should be to better design your system rather than rely on any method to retrieve one element from a collection. You say it makes it more complicated, but how? Isn't this solution itself a complication? Is it possible to change the interface to return one element where applicable and a collection elsewhere? Seems like a code smell to me.

Categories

Resources