SortedSet Enumeration not working as expected - c#

Lets say ClassA is a random class and I am creating SortedSet of ClassA. I am storing the enumeration in dictionary but whenever I try to access then it always gives null;
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, SortedSet<ClassA>.Enumerator>();
map[1] = set.GetEnumerator();
map[1].MoveNext();
var val = map[1].Current;
//Why val is null ???

This happens because SortedSet<T>.Enumerator is a struct. Each time you use the dictionary's indexer to retrieve the enumerator, you get a new copy of it. So even though you call MoveNext() on that copy, the next time you get a copy of the enumerator, it does not have any value for Current.
Interestingly, because of a quirk in the exact implementation of that struct, each copy of the enumerator gets the same reference-type object to track the state of the enumeration (a stack), and so the MoveNext() method seems to work (i.e. it returns true the first time you call it, but false any subsequent time).
There are at least four options for handling this correctly…
Retrieve the copy into a variable, and use the variable instead of the dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, SortedSet<ClassA>.Enumerator>();
map[1] = set.GetEnumerator();
var e = map[1];
e.MoveNext();
val = e.Current;
Note that in this example, the dictionary's copy of the enumerator still will not have the Current value you want. You would have to set the dictionary's copy back again after calling MoveNext() to preserve that: map[1] = e;
Use an array instead of a dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new SortedSet<ClassA>.Enumerator[2];
map[1] = set.GetEnumerator();
map[1].MoveNext();
var val = map[1].Current;
Other than the difference in the declaration and initialization of map, this works exactly as you have the code now. This is because indexed elements of an array are variables, rather than going through an indexer as the indexing syntax would with any other collection. So you are operating directly on the copy of the enumerator stored in the array, rather than a fresh copy returned by an indexer.
Of course, this would only work if the key values were constrained enough to make it feasible to allocate an array large enough to hold all possibilities for the key.
Declare a reference-type wrapper to contain the value-type enumerator:
class E<T>
{
public SortedSet<T>.Enumerator Enumerator;
public E(SortedSet<T>.Enumerator enumerator)
{
Enumerator = enumerator;
}
}
then…
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, E<ClassA>>();
map[1] = new E<ClassA>(set.GetEnumerator());
map[1].Enumerator.MoveNext();
val = map[1].Enumerator.Current;
In this example, the dictionary is just returning the reference to the wrapper object, ensuring just a single copy of the enumerator (which is stored in that wrapper object rather than the dictionary). Thus every time you access the object through the dictionary, you get the same copy.
Of course, you wind up having to go through the Enumerator field of the wrapper. It's a bit clumsy. But it would work.
Store the enumerators in an array, but index through a dictionary:
var set = new SortedSet<ClassA>();
set.Add(new ClassA());
var map = new Dictionary<int, int>();
var a = new SortedSet<ClassA>.Enumerator[1];
map[1] = 0;
a[map[1]] = set.GetEnumerator();
a[map[1]].MoveNext();
val = a[map[1]].Current;
This blends the two previous options. The array is used to store the actual enumerators, so that you can address them as variables, but the dictionary is used as a level of indirection to the array, so that you can refer to the enumerators via whatever keys you like.
Obviously, in a real application you would initialize the array by enumerating the original collections, storing their enumerators in the array and mapping the original key to the array index in the dictionary as you go.
The extra indirection is a little clumsy, as in the wrapper option, but not quite as bad as that, and solves the array-size concern of the array-based option.
Arguably, your question is a duplicate of this question: List.All(e => e.MoveNext()) doesn't move my enumerators on
It is definitely closely related to that one, as well as this one: Details on what happens when a struct implements an interface

Related

What is the optimal data structure for storing objects with a string key and a bool auxiliary value?

I need a data structure like below, but I need to be able to change the bool value. Other two stay the as they were when they were initialized. What would you use for best performance?
Dictionary<string, (object, bool)> dic = new Dictionary<string, (object, bool)>();
I was thinking of hashtable. But hashtable is like a dictionary with key/value. The object and bool in my example are in concept not like a key/value, because other values of the external dictionary can have the same object (or better yet ... object type). I don't want to make someone looking at my code later on thinking that the object and bool are more related they really are.
EDIT: object in this example is just a place holder. In reality it's a complex object with other objects in it and so on. Procedure before this one makes a bunch of this objects and some of them are deepcopy of the others. They are passed to this procedure. All of the object are here named by some rules and stored in the dictionary. Names are obviously unique. Procedure that comes after will take this dictionary and set the bool value on and off based on the values in the objects themselves and on the values of other bools. Procedure will be recursive until some state is reached.
Number of objects (or dic. entries) is arbitrary but expected to be >100 && <500. Time complexity is O(n).
I am targeting .NET7 (standard).
but I need to be able to change the bool value.
You can just reassign value for the key:
var tuples = new Dictionary<string, (object Obj, bool Bool)>
{
{ "1", (new object(), true) }
};
tuples["1"] = (tuples["1"].Obj, false); // or tuples["1"] = (tuples["1"].Item1, false);
Or
if (tuples.TryGetValue("1", out var c))
{
tuples["1"] = (c.Obj, false);
}
Personally I would leave it at that, but for really high perf scenarios you can look into CollectionMarshall instead of second snippet:
ref var v = ref CollectionsMarshal.GetValueRefOrNullRef(tuples, "1");
if (!Unsafe.IsNullRef(ref v))
{
v.Bool = false;
}
A bit more info - here.
For the 'performance' aspect:
The .NET Dictionary uses hashes to look up the item you need, which is very fast (comparable to a HashTable). I don't expect much performance issues related to this, or at least nothing that can be improved on with other data structures.
Also, you shouldn't worry about performance unless you are doing things a million times in a row + it turns out (in practice) that something is taking a measurable amount of time.
For the 'changing a bool' aspect:
... that is quite a long story.
There are 2 tuple variants in .NET:
The value tuple, created by doing var x = (myObj, myBool), like you are doing.
The x is a struct, and therefore a Value Type. You can actually change x.Item1 or x.Item2 to a new value just fine.
However... if you put x into a Dictionary then you actually put a copy of x (with a copy of its values) into the dictionary, because that is the nature of value types.
When you retrieve it again from the Dictionary, yet another copy is made - which makes modifying the actual tuple inside the Dictionary impossible; any attempt to do so would only modify the last copy you got.
Side story: The .NET Compiler knows this, which is why its refuses to compile code like dic[yourKey].Item2 = newBool; because such code wouldn't do what you might hope it would do. You're basically telling the compiler to create a copy, modify the copy, and then... discard the copy. The compiler requries a variable to store the copy before the rest can even start, but we provided no variable.
The Tuple generic class, or rather a range of generic classes, an instance of which can be created using calls like var x = Tuple.Create(myObj, myBool). These classes however forbid that you change any of their properties, they are always readonly. Tuple class instances can be put in a Dictionary, but they will still be readonly.
So what options are there really to 'modify a value in a tuple' a Dictionary?
Keep using a value tuple, but accept that in order to "change" the tuple inside the Dictionary you'll have to make a new instance (either a copy, or from scratch), set it to the properties that you want, and put that instance (or actualy a copy...) into the dictionary:
// initialize it
var dict = new Dictionary<string, (object, bool)>();
var obj = new object();
dict["abc"] = (obj, true);
// change it
var tmpTuple = dict["abc"]; // get copy
tmpTuple.Item2 = false; // alter copy
dict["abc"] = tmpTuple; // store another copy
// or if you want to avoid the tmp variable
dict["abc"] = (dict["abc"].Item1, false)
Use a custom class instead of the value tuple or a Tuple class, and then put that into the Dictionary:
public class MyPair
{
public object O { get; set; }
public bool B { get; set; }
}
// initialize it
var dict = new Dictionary<string, MyPair>();
var obj = new object();
dict["abc"] = new MyPair { O = obj, B = true };
// change it
dict["abc"].B = false;
So both types of Tuples are OK for objects that you don't want to do a lot with. But both have certain limits in their usage, and sooner or later you may need to start using classes.

Enumerable.Repeat perform badly with for loop on initializing List<> [duplicate]

I have a question about Enumerable.Repeat function.
If I will have a class:
class A
{
//code
}
And I will create an array, of that type objects:
A [] arr = new A[50];
And next, I will want to initialize those objects, calling Enumerable.Repeat:
arr = Enumerable.Repeat(new A(), 50);
Will those objects have the same address in memory?
If I will want to check their hash code, for example in that way:
bool theSameHashCode = questions[0].GetHashCode() == questions[1].GetHashCode();
This will return me true, and if I will change one object properties, all other objects will change it too.
So my question is: is that properly way, to initialize reference type objects? If not, then what is a better way?
Using Enumerable.Repeat this way will initialize only one object and return that object every time when you iterate over the result.
Will those objects have the same address in memory?
There is only one object.
To achieve what you want, you can do this:
Enumerable.Range(1, 50).Select(i => new A()).ToArray();
This will return an array of 50 distinct objects of type A.
By the way, the fact that GetHashCode() returns the same value does not imply that the objects are referentially equal (or simply equal, for that matter). Two non-equal objects can have the same hash code.
Just to help clarify for Camilo, here's some test code that shows the issue at hand:
void Main()
{
var foos = Enumerable.Repeat(new Foo(), 2).ToArray();
foos[0].Name = "Jack";
foos[1].Name = "Jill";
Console.WriteLine(foos[0].Name);
}
public class Foo
{
public string Name;
}
This prints "Jill". Thus it shows that Enumerable.Repeat is only creating one instance of the Foo class.
When using the following code to create an array:
var foos = Enumerable.Repeat(new Foo(), 2).ToArray();
The reason why each location in the array is the same is because you are passing an object, and not a function that creates an object, the code above is the same as:
var foo = new Foo();
var foos = Enumerable.Repeat(foo , 2).ToArray();
The reason above also explains why using a Select statement, like in the code below, creates a new object for each entry, because you are passing a function that dictates how each object is created, rather than the object itself.
Enumerable.Range(1, 2).Select(i => new Foo()).ToArray();
I would use a simple for loop to populate an array with new reference types.

How to add a new element to a hashset that is value of a ConcurrentDictionary?

I have a ConcurrentDictionary that has as key a long and as value a hashset of int. I want that if the key isn't in the dictionary, add a new hashset with the first element. If the key exists, add the new element to the existing dictionary.
I am trying something like that:
ConcurrentDictionary<long, HashSet<int>> myDic = new ConcurrentDictionary<long, HashSet<int>>();
int myElement = 1;
myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement},
(key, actualValue) => actualValue.Add(myElement));
The problem with this code is the third parameter, because .Add() method returns a bool and the AddOrUpdate expects a hashset. The first and second parameters are right.
So my question is how I can add a new element to the hashset in thread-safe way and avoid duplicates (it is the reason why I am using a hashset as value). The problem of the hashset is that it is not thread-safe and if I get it first and later add the new element, I am doing outside of the dictionary and I could have problems.
Thanks.
To fix compiler error you can do this:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
actualValue.Add(myFirstElement);
return actualValue;
});
BUT this is not thread safe, because "update" function is not run inside any lock so you are potentially adding to not-thread-safe HashSet from multiple threads. This might result in (for example) losing values (so you were adding 1000 items to HashSet but in the end you have only 970 items in it for example). Update function in AddOrUpdate should not have any side effects and here it does.
You can lock yourself over adding values to HashSet:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
lock (actualValue) {
actualValue.Add(myFirstElement);
return actualValue;
}
});
But then question is why you are using lock-free structure (ConcurrentDictionary) in the first place. Besides that - any other code might get HashSet from your dictionary and add value there without any locks, making the whole thing useless. So if you decide to go that way for some reason - you have to ensure that all code locks when accessing HashSet from that dictionary.
Instead of all that - just use concurrent collection instead of HashSet. There is no ConcurrentHashSet as far as I know but you can use another ConcurrentDictionary with dummy keys as a replacement (or look over internet for custom implementations).
Side note. Here
myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement},
you create new HashSet every time when calling AddOrUpdate, even if that dictionary is not needed because key is already there. Instead use overload with add value factory:
myDic.AddOrUpdate(1, (key) => new HashSet<int>() { myFirstElement },
Edit: sample usage of ConcurrentDictionary as hash set:
var myDic = new ConcurrentDictionary<long, ConcurrentDictionary<int, byte>>();
long key = 1;
int element = 1;
var hashSet = myDic.AddOrUpdate(key,
_ => new ConcurrentDictionary<int, byte>(new[] {new KeyValuePair<int, byte>(element, 0)}),
(_, oldValue) => {
oldValue.TryAdd(element, 0);
return oldValue;
});
If you wrap the anonymous function definition in curly braces, you can define multiple statements in the body of the function and thus specify the return value like this:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
actualValue.Add(myElement);
return actualValue;
});

SortedSet and SortedList fails with different enums

The whole story; I have some KeyValuePairs that I need to store in a session and my primary goal is to keep it small. Therefore I don't have the option of using many different collection. While the key is a different enum value of of a different enum type the value is always just a enum value of the same enum type. I have chosen a HashTable for this approach which content look like this (just many more):
// The Key-Value-Pairs
{ EnumTypA.ValueA1, MyEnum.ValueA },
{ EnumTypB.ValueB1, MyEnum.ValueB },
{ EnumTypC.ValueC1, MyEnum.ValueA },
{ EnumTypA.ValueA2, MyEnum.ValueC },
{ EnumTypB.ValueB1, MyEnum.ValueC }
At most I am running contains on that HashTable but for sure I also need to fetch the value at some point and I need to loop through all elements. That all works fine but now I have a new requirement to keep the order I have added them to the HashTable -> BANG
A HashTable is a map and that is not possible!
Now I thought about using a SortedList<object, MyEnum> or to go with more Data but slightly faster lookups and use a SortedSet<object> in addition to the HashTable.
Content below has been edited
The SortedList is implemented as
SortedList<Enum, MyEnum> mySortedList = new SortedList<Enum, MyEnum>();
the SortedSet is implemented as
SortedSet<Enum> mySortedSet = new SortedSet<Enum>();
The described Key - Value - Pairs are added to the sorted list with
void AddPair(Enum key, MyEnum value)
{
mySortedList.Add(key, value);
}
And for the SortedSett like this
void AddPair(Enum key)
{
mySortedSet.Add(key);
}
Both are failing with the exception:
Object must be the same type as the
enum
My question is: What goes wrong and how can I archive my goal?
Used Solution
I've decided to life with the downside
of redundant data against slower
lookups and decided to implement a
List<Enum> which will retain the
insert order parallel to my already
existing HashTable.
In my case I just have about 50-150
Elements so I decided to benchmark the
Hashtable against the
List<KeyValuePair<object,object>>
Therefore I have create me the
following helper to implement
ContainsKey() to the
List<KeyValuePair<object,object>>
static bool ContainsKey(this List<KeyValuePair<object, object>> list, object key)
{
foreach (KeyValuePair<object, object> p in list)
{
if (p.Key.Equals(key))
return true;
}
return false;
}
I inserted the same 100 Entries and
checked randomly for one of ten
different entries in a 300000 loop.
And... the difference was tiny so I
decided to go with the
List<KeyValuePair<object,object>>
I think you should store your data in an instance of List<KeyValuePair<Enum, MyEnum>> or Dictionary<Enum, MyEnum>.
SortedSet and SortedList are generic, but your keys are EnumTypeA/EnumTypeB, you need to specify the generic T with their base class(System.Enum) like:
SortedList<Enum, MyEnum> sorted = new SortedList<Enum, MyEnum>();
EDIT
Why you got this exception
SortedList and SortedSet use a comparer inside to check if two keys are equal. Comparer<Enum>.Default will be used as the comparer if you didn't specify the comparer in the constructor. Unfortunately Comparer<Enum>.Default isn't implemented as you expected. It throws the exception if the two enums are not the same type.
How to resolve the problem
If you don't want to use a List<KeyValuePair<Enum, MyEnum>> and insist using SortedLIst, you need to specify a comparer to the constructor like this:
class EnumComparer : IComparer<Enum>
{
public int Compare(Enum x, Enum y)
{
return x.GetHashCode() - y.GetHashCode();
}
}
var sorted = new SortedList<Enum, MyEnum>(new EnumComparer());
Btw, I think you need to obtain the "inserting order"? If so, List<KeyValuePair<K,V>> is a better choice, because SortedSet will prevent duplicated items.

how to add an associative index to an array. c#

i have an array of custom objects. i'd like to be able to reference this array by a particular data member, for instance myArrary["Item1"]
"Item1" is actually the value stored in the Name property of this custom type and I can write a predicate to mark the appropriate array item. However I am unclear as to how to let the array know i'd like to use this predicate to find the array item.
I'd like to just use a dictionary or hashtable or NameValuePair for this array, and get around this whole problem but it's generated and it must remain as CustomObj[]. i'm also trying to avoid loading a dictionary from this array as it's going to happen many times and there could be many objects in it.
For clarification
myArray[5] = new CustomObj() // easy!
myArray["ItemName"] = new CustomObj(); // how to do this?
Can the above be done? I'm really just looking for something similar to how DataRow.Columns["MyColumnName"] works
Thanks for the advice.
What you really want is an OrderedDictionary. The version that .NET provides in System.Collections.Specialized is not generic - however there is a generic version on CodeProject that you could use. Internally, this is really just a hashtable married to a list ... but it is exposed in a uniform manner.
If you really want to avoid using a dictionary - you're going to have to live with O(n) lookup performance for an item by key. In that case, stick with an array or list and just use the LINQ Where() method to lookup a value. You can use either First() or Single() depending on whether duplicate entries are expected.
var myArrayOfCustom = ...
var item = myArrayOfCustom.Where( x => x.Name = "yourSearchValue" ).First();
It's easy enough to wrap this functionality into a class so that external consumers are not burdened by this knowledge, and can use simple indexers to access the data. You could then add features like memoization if you expect the same values are going to be accessed frequently. In this way you could amortize the cost of building the underlying lookup dictionary over multiple accesses.
If you do not want to use "Dictionary", then you should create class "myArrary" with data mass storage functionality and add indexers of type "int" for index access and of type "string" for associative access.
public CustomObj this [string index]
{
get
{
return data[searchIdxByName(index)];
}
set
{
data[searchIdxByName(index)] = value;
}
}
First link in google for indexers is: http://www.csharphelp.com/2006/04/c-indexers/
you could use a dictionary for this, although it might not be the best solution in the world this is the first i came up with.
Dictionary<string, int> d = new Dictionary<string, int>();
d.Add("cat", 2);
d.Add("dog", 1);
d.Add("llama", 0);
d.Add("iguana", -1);
the ints could be objects, what you like :)
http://dotnetperls.com/dictionary-keys
Perhaps OrderedDictionary is what you're looking for.
you can use HashTable ;
System.Collections.Hashtable o_Hash_Table = new Hashtable();
o_Hash_Table.Add("Key", "Value");
There is a class in the System.Collections namespace called Dictionary<K,V> that you should use.
var d = new Dictionary<string, MyObj>();
MyObj o = d["a string variable"];
Another way would be to code two methods/a property:
public MyObj this[string index]
{
get
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
return o;
}
}
}
set
{
foreach (var o in My_Enumerable)
{
if (o.Name == index)
{
var i = My_Enumerable.IndexOf(0);
My_Enumerable.Remove(0);
My_Enumerable.Add(value);
}
}
}
}
I hope it helps!
It depends on the collection, some collections allow accessing by name and some don't. Accessing with strings is only meaningful when the collection has data stored, the column collection identifies columns by their name, thus allowing you to select a column by its name. In a normal array this would not work because items are only identified by their index number.
My best recommendation, if you can't change it to use a dictionary, is to either use a Linq expression:
var item1 = myArray.Where(x => x.Name == "Item1").FirstOrDefault();
or, make an extension method that uses a linq expression:
public static class CustomObjExtensions
{
public static CustomObj Get(this CustomObj[] Array, string Name)
{
Array.Where(x => x.Name == Name).FirstOrDefault();
}
}
then in your app:
var item2 = myArray.Get("Item2");
Note however that performance wouldn't be as good as using a dictionary, since behind the scenes .NET will just loop through the list until it finds a match, so if your list isn't going to change frequently, then you could just make a Dictionary instead.
I have two ideas:
1) I'm not sure you're aware but you can copy dictionary objects to an array like so:
Dictionary dict = new Dictionary();
dict.Add("tesT",40);
int[] myints = new int[dict.Count];
dict.Values.CopyTo(myints, 0);
This might allow you to use a Dictionary for everything while still keeping the output as an array.
2) You could also actually create a DataTable programmatically if that's the exact functionality you want:
DataTable dt = new DataTable();
DataColumn dc1 = new DataColumn("ID", typeof(int));
DataColumn dc2 = new DataColumn("Name", typeof(string));
dt.Columns.Add(dc1);
dt.Columns.Add(dc2);
DataRow row = dt.NewRow();
row["ID"] = 100;
row["Name"] = "Test";
dt.Rows.Add(row);
You could also create this outside of the method so you don't have to make the table over again every time.

Categories

Resources