Immutable set in .NET

Immutable set in .NET - c#

Does the .NET BCL have an immutable Set type? I'm programming in a functional dialect of C# and would like to do something like
new Set.UnionWith(A).UnionWith(B).UnionWith(C)
But the best I can find is HashSet.UnionWith, which would require the following sequence of calls:
HashSet composite = new HashSet();
composite.UnionWith(A);
composite.UnionWith(B);
composite.UnionWith(C);
This use is highly referentially opaque, making it hard to optimize and understand. Is there a better way to do this without writing a custom functional set type?

The new ImmutableCollections have:
ImmutableStack<T>
ImmutableQueue<T>
ImmutableList<T>
ImmutableHashSet<T>
ImmutableSortedSet<T>
ImmutableDictionary<K, V>
ImmutableSortedDictionary<K, V>
More info here
About the union this test passes:
[Test]
public void UnionTest()
{
var a = ImmutableHashSet.Create("A");
var b = ImmutableHashSet.Create("B");
var c = ImmutableHashSet.Create("C");
var d = a.Union(b).Union(c);
Assert.IsTrue(ImmutableHashSet.Create("A", "B", "C").SetEquals(d));
}

Update
This answer was written some time ago, and since then a set of immutable collections have been introduced in the System.Collections.Immutable namespace.
Original answer
You can roll out your own method for this:
public static class HashSetExtensions {
public static HashSet<T> Union<T>(this HashSet<T> self, HashSet<T> other) {
var set = new HashSet<T>(self); // don't change the original set
set.UnionWith(other);
return set;
}
}
Use it like this:
var composite = A.Union(B).Union(C);
You can also use LINQ's Union, but to get a set, you'll need to pass the result to the HashSet constructor:
var composite = new HashSet<string>(A.Union(B).Union(C));
But, HashSet itself is mutable. You could try to use F#'s immutable set.
Also, as mentioned in the comments by ErikE, using Concat yields the same result and probably performs better:
var composite = new HashSet<string>(A.Concat(B).Concat(C));

There is a ReadOnlyCollection, but it's not a hash table. LINQ adds the Union method as an extension.

Related

How to sort a dictionary that contains dictionary?

I have a dictionary that contains another dictionary as value
something like this
Dictionary<string, Dictionary<double,double>>
now I want to sort it by internal dictionary's value
how can I do that??

From the looks of your comment
sorry i am not expert on c#, would you suggest a way to store 3 values as one item?
I would suggest creating a class and sort on it like this
public class MyClass
{
public string StringProperty {get;set;}
public int FirstDoubleProperty {get;set;}
public int SecondDoubleProperty {get;set;}
}
Then create a collection like this
List<MyClass> MyClasscol = new List<MyClass>();
MyClass mc = new MyClass();
mc.StringProperty = "User1225072";
mc.FirstDoubleProperty = 5;
mc.SecondDoubleProperty = 6;
MyClasscol.Add(mc);
mc = new MyClass();
// and So on
then sort like this
var newsortedcollection = MyClasscol.OrderBy(x => x.FirstDoubleProperty);

Assuming you are now trying to figure out how to store and order a collection of objects with multiple properties, then you have a few options. Nikhil Agrawal's answer is a great solution but there are times when you may not need/want to create a custom class for this. For these situations (preferably when your code is private and not part of some API) then the alternatives below might be an option.
KeyValuePairs
Based on your requirements and your original post using dictionaries, it seems like instead of a dictionary of dictionaries (multi-tiered), you probably wanted a dictionary of keyvaluepairs (flat).
// using keyvaluepair
var keyValueDict = new Dictionary<string, KeyValuePair<double, double>>();
keyValueDict.Add("string", new KeyValuePair<double, double>(5.8, 7.4));
var sortedKeyValues = keyValueDict.OrderBy(x => x.Value.Key);
Tuples
An alternative to the not so pleasant KeyValuePair is the Tuple introduced in .NET 4. The tuple is a generic class which allows you to store typed property values without creating your own custom class. It is worth noting that there are tuple implementations for up to 8 properties.
// using tuple
var tupleList = new List<Tuple<string, double, double>>();
tupleList.Add(new Tuple<string, double, double>("string", 5.8, 7.4));
var sortedTuples = tupleList.OrderBy(x => x.Item2);
There are some good SO questions about Tuples if you are interested:
Is Using .NET 4.0 Tuples in my C# Code a Poor Design Decision?
Are EventArg classes needed now that we have generics

Ideal c# reduction method for values of the same type, with bitwise approach?

Good day all,
I have a class and a property, and I have three instances of that class.
public class myclass {
public int myproperty;
}
...
myclass inst1, inst2, inst3;
...
Now at a certain point I need to compare those three property values, and verify that they be equal or not, to end up with the least amount of values.
So if I have
inst1.myproperty = 3;
inst2.myproperty = 5;
inst3.myproperty = 3;
after the magic_function_call, I should get 3 and 5.
And if I have
inst1.myproperty = 2;
inst2.myproperty = 2;
inst3.myproperty = 2;
after the magic_function_call, I should get 2.
Albeit this is trivial per se, and can be solved with as many IF checks as needed, I was wondering which is the fastest, or more efficient way to do it, especially in light of the fact that I might need to add another variable to the check in the future.
I am in fact wondering if there is a bitwise operation that can be performed that can solve this elegantly and quickly.
Alternatively, is there an array operation that can be used to achieve the same goal? I've tried looking for 'reduction' or 'compression' but those keywords don't seem to lead in the right direction.
Thanks in advance.

You can use the morelinq DistinctBy query operator if all of the instances belong to a collection:
List<MyClass> myList = new List<MyClass>();
.. populate list
List<MyClass> distinct = myList.DistinctBy(mc => mc.myproperty).ToList();
Looking at the question, you may want a list of just the property values (a list of ints), which you can achieve with the standard query operators:
List<int> distinct = myList.Select(mc => mc.myproperty).Distinct().ToList();
Note that you haven't defined a property, you've defined a public field. To define an auto property change:
public int myproperty;
to
public int myproperty { get; set; }
Note also that PascalCasing is recommended for property and class names.

Here's a quick function which doesn't require any extra libraries and avoids the setup costs and overheads associated with LINQ:
static int[] Reduce(IEnumerable<myclass> items)
{
HashSet<int> uniqueValues = new HashSet<int>();
foreach (var item in items)
{
uniqueValues.Add(item.myproperty);
}
return uniqueValues.ToArray();
}
Pass it a collection of your myclass instances and it will return an array of unique myproperty values.

Just anohter way to implement it .
var result = myList.GroupBy(p => p.myproperty).Select(p => p.First());

creating a generic collection of key and values in C# 4.0

I need something like Dictionary where dynamic can be anything from string to objects.
But when i use objects, i need to know the type of the object and then access the appropriate properties of those objects.
Is there a way WITHOUT using Reflection.
* EDITED **
I tried to use this :
CloneObject<T, TU>(IDictionary<T, TU> sourceObject)
But if i use this, how can i access T's public fields without using reflection

You can use Hashtable for this purpose
Here is the Examples
http://www.dotnetperls.com/hashtable
You can also use Dictionary which is more efficient than Hashtable
See Examples Here:
http://www.dotnetperls.com/dictionary-keys

I'm confused a little bit. You trying to store any types of objects in your dictionary but access to them without reflection.
If so you can use dynamic types:
Dictionary dict = new Dictionary();
dict["string"] = "some string";
dict["int"] = 25;
dict["my_class"] = new MyClass {SomeProperty = 12};
And then you can access all this values without any casts:
string s1 = dict["string"].Substring(0, 4); // s1 equals to "some"
int propertyValue = dict["my_class"].SomeProperty; // propertyValue equals to 12
where MyClass is:
class MyClass
{
public int SomeProperty {get;set;}
}

Without using reflection, this task cannot be completed. All I have done is create clones of objects separately and then used them.

Does Distinct() method keep original ordering of sequence intact?

I want to remove duplicates from list, without changing order of unique elements in the list.
Jon Skeet & others have suggested to use the following:
list = list.Distinct().ToList();
Reference:
How to remove duplicates from a List<T>?
Remove duplicates from a List<T> in C#
Is it guaranteed that the order of unique elements would be same as before? If yes, please give a reference that confirms this as I couldn't find anything on it in documentation.

It's not guaranteed, but it's the most obvious implementation. It would be hard to implement in a streaming manner (i.e. such that it returned results as soon as it could, having read as little as it could) without returning them in order.
You might want to read my blog post on the Edulinq implementation of Distinct().
Note that even if this were guaranteed for LINQ to Objects (which personally I think it should be) that wouldn't mean anything for other LINQ providers such as LINQ to SQL.
The level of guarantees provided within LINQ to Objects is a little inconsistent sometimes, IMO. Some optimizations are documented, others not. Heck, some of the documentation is flat out wrong.

In the .NET Framework 3.5, disassembling the CIL of the Linq-to-Objects implementation of Distinct() shows that the order of elements is preserved - however this is not documented behavior.
I did a little investigation with Reflector. After disassembling System.Core.dll, Version=3.5.0.0 you can see that Distinct() is an extension method, which looks like this:
public static class Emunmerable
{
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw new ArgumentNullException("source");
return DistinctIterator<TSource>(source, null);
}
}
So, interesting here is DistinctIterator, which implements IEnumerable and IEnumerator. Here is simplified (goto and lables removed) implementation of this IEnumerator:
private sealed class DistinctIterator<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
private bool _enumeratingStarted;
private IEnumerator<TSource> _sourceListEnumerator;
public IEnumerable<TSource> _source;
private HashSet<TSource> _hashSet;
private TSource _current;
private bool MoveNext()
{
if (!_enumeratingStarted)
{
_sourceListEnumerator = _source.GetEnumerator();
_hashSet = new HashSet<TSource>();
_enumeratingStarted = true;
}
while(_sourceListEnumerator.MoveNext())
{
TSource element = _sourceListEnumerator.Current;
if (!_hashSet.Add(element))
continue;
_current = element;
return true;
}
return false;
}
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
TSource IEnumerator<TSource>.Current
{
get { return _current; }
}
object IEnumerator.Current
{
get { return _current; }
}
}
As you can see - enumerating goes in order provided by source enumerable (list, on which we are calling Distinct). Hashset is used only for determining whether we already returned such element or not. If not, we are returning it, else - continue enumerating on source.
So, it is guaranteed, that Distinct() will return elements exactly in same order, which are provided by collection to which Distinct was applied.

According to the documentation the sequence is unordered.

Yes, Enumerable.Distinct preserves order. Assuming the method to be lazy "yields distinct values are soon as they are seen", it follows automatically. Think about it.
The .NET Reference source confirms. It returns a subsequence, the first element in each equivalence class.
foreach (TSource element in source)
if (set.Add(element)) yield return element;
The .NET Core implementation is similar.
Frustratingly, the documentation for Enumerable.Distinct is confused on this point:
The result sequence is unordered.
I can only imagine they mean "the result sequence is not sorted." You could implement Distinct by presorting then comparing each element to the previous, but this would not be lazy as defined above.

A bit late to the party, but no one really posted the best complete code to accomplish this IMO, so let me offer this (which is essentially identical to what .NET Framework does with Distinct())*:
public static IEnumerable<T> DistinctOrdered<T>(this IEnumerable<T> items)
{
HashSet<T> returnedItems = new HashSet<T>();
foreach (var item in items)
{
if (returnedItems.Add(item))
yield return item;
}
}
This guarantees the original order without reliance on undocumented or assumed behavior. I also believe this is more efficient than using multiple LINQ methods though I'm open to being corrected here.
(*) The .NET Framework source uses an internal Set class, which appears to be substantively identical to HashSet.

By default when use Distinct linq operator uses Equals method but you can use your own IEqualityComparer<T> object to specify when two objects are equals with a custom logic implementing GetHashCode and Equals method.
Remember that:
GetHashCode should not used heavy cpu comparision ( eg. use only some obvious basic checks ) and its used as first to state if two object are surely different ( if different hash code are returned ) or potentially the same ( same hash code ). In this latest case when two object have the same hashcode the framework will step to check using the Equals method as a final decision about equality of given objects.
After you have MyType and a MyTypeEqualityComparer classes follow code not ensure the sequence maintain its order:
var cmp = new MyTypeEqualityComparer();
var lst = new List<MyType>();
// add some to lst
var q = lst.Distinct(cmp);
In follow sci library I implemented an extension method to ensure Vector3D set maintain the order when use a specific extension method DistinctKeepOrder:
relevant code follows:
/// <summary>
/// support class for DistinctKeepOrder extension
/// </summary>
public class Vector3DWithOrder
{
public int Order { get; private set; }
public Vector3D Vector { get; private set; }
public Vector3DWithOrder(Vector3D v, int order)
{
Vector = v;
Order = order;
}
}
public class Vector3DWithOrderEqualityComparer : IEqualityComparer<Vector3DWithOrder>
{
Vector3DEqualityComparer cmp;
public Vector3DWithOrderEqualityComparer(Vector3DEqualityComparer _cmp)
{
cmp = _cmp;
}
public bool Equals(Vector3DWithOrder x, Vector3DWithOrder y)
{
return cmp.Equals(x.Vector, y.Vector);
}
public int GetHashCode(Vector3DWithOrder obj)
{
return cmp.GetHashCode(obj.Vector);
}
}
In short Vector3DWithOrder encapsulate the type and an order integer, while Vector3DWithOrderEqualityComparer encapsulates original type comparer.
and this is the method helper to ensure order maintained
/// <summary>
/// retrieve distinct of given vector set ensuring to maintain given order
/// </summary>
public static IEnumerable<Vector3D> DistinctKeepOrder(this IEnumerable<Vector3D> vectors, Vector3DEqualityComparer cmp)
{
var ocmp = new Vector3DWithOrderEqualityComparer(cmp);
return vectors
.Select((w, i) => new Vector3DWithOrder(w, i))
.Distinct(ocmp)
.OrderBy(w => w.Order)
.Select(w => w.Vector);
}
Note : further research could allow to find a more general ( uses of interfaces ) and optimized way ( without encapsulate the object ).

This highly depends on your linq-provider. On Linq2Objects you can stay on the internal source-code for Distinct, which makes one assume the original order is preserved.
However for other providers that resolve to some kind of SQL for example, that isn´t neccessarily the case, as an ORDER BY-statement usually comes after any aggregation (such as Distinct). So if your code is this:
myArray.OrderBy(x => anothercol).GroupBy(x => y.mycol);
this is translated to something similar to the following in SQL:
SELECT * FROM mytable GROUP BY mycol ORDER BY anothercol;
This obviously first groups your data and sorts it afterwards. Now you´re stuck on the DBMS own logic of how to execute that. On some DBMS this isn´t even allowed. Imagine the following data:
mycol anothercol
1 2
1 1
1 3
2 1
2 3
when executing myArr.OrderBy(x => x.anothercol).GroupBy(x => x.mycol) we assume the following result:
mycol anothercol
1 1
2 1
But the DBMS may aggregate the anothercol-column so, that allways the value of the first row is used, resulting in the following data:
mycol anothercol
1 2
2 1
which after ordering will result in this:
mycol anothercol
2 1
1 2
This is similar to the following:
SELECT mycol, First(anothercol) from mytable group by mycol order by anothercol;
which is the completely reverse order than what you expected.
You see the execution-plan may vary depending on what the underlying provider is. This is why there´s no guarantee about that in the docs.

How to convert linq results to HashSet or HashedSet

I have a property on a class that is an ISet. I'm trying to get the results of a linq query into that property, but can't figure out how to do so.
Basically, looking for the last part of this:
ISet<T> foo = new HashedSet<T>();
foo = (from x in bar.Items select x).SOMETHING;
Could also do this:
HashSet<T> foo = new HashSet<T>();
foo = (from x in bar.Items select x).SOMETHING;

I don't think there's anything built in which does this... but it's really easy to write an extension method:
public static class Extensions
{
public static HashSet<T> ToHashSet<T>(
this IEnumerable<T> source,
IEqualityComparer<T> comparer = null)
{
return new HashSet<T>(source, comparer);
}
}
Note that you really do want an extension method (or at least a generic method of some form) here, because you may not be able to express the type of T explicitly:
var query = from i in Enumerable.Range(0, 10)
select new { i, j = i + 1 };
var resultSet = query.ToHashSet();
You can't do that with an explicit call to the HashSet<T> constructor. We're relying on type inference for generic methods to do it for us.
Now you could choose to name it ToSet and return ISet<T> - but I'd stick with ToHashSet and the concrete type. This is consistent with the standard LINQ operators (ToDictionary, ToList) and allows for future expansion (e.g. ToSortedSet). You may also want to provide an overload specifying the comparison to use.

Just pass your IEnumerable into the constructor for HashSet.
HashSet<T> foo = new HashSet<T>(from x in bar.Items select x);

This functionality has been added as an extension method on IEnumerable<TSource> to .NET Framework 4.7.2 and .NET Core 2.0. It is consequently also available on .NET 5 and later.
ToHashSet<TSource>(IEnumerable<TSource>)
ToHashSet<TSource>(IEnumerable<TSource>, IEqualityComparer<TSource>)

As #Joel stated, you can just pass your enumerable in. If you want to do an extension method, you can do:
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> items)
{
return new HashSet<T>(items);
}

There is an extension method build in the .NET framework and in .NET core for converting an IEnumerable to a HashSet: https://learn.microsoft.com/en-us/dotnet/api/?term=ToHashSet
public static System.Collections.Generic.HashSet<TSource> ToHashSet<TSource> (this System.Collections.Generic.IEnumerable<TSource> source);
It appears that I cannot use it in .NET standard libraries yet (at the time of writing). So then I use this extension method:
[Obsolete("In the .NET framework and in NET core this method is available, " +
"however can't use it in .NET standard yet. When it's added, please remove this method")]
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source, IEqualityComparer<T> comparer = null) => new HashSet<T>(source, comparer);

That's pretty simple :)
var foo = new HashSet<T>(from x in bar.Items select x);
and yes T is the type specified by OP :)

If you need just readonly access to the set and the source is a parameter to your method, then I would go with
public static ISet<T> EnsureSet<T>(this IEnumerable<T> source)
{
ISet<T> result = source as ISet<T>;
if (result != null)
return result;
return new HashSet<T>(source);
}
The reason is, that the users may call your method with the ISet already so you do not need to create the copy.

Jon's answer is perfect. The only caveat is that, using NHibernate's HashedSet, I need to convert the results to a collection. Is there an optimal way to do this?
ISet<string> bla = new HashedSet<string>((from b in strings select b).ToArray());
or
ISet<string> bla = new HashedSet<string>((from b in strings select b).ToList());
Or am I missing something else?
Edit: This is what I ended up doing:
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
{
return new HashSet<T>(source);
}
public static HashedSet<T> ToHashedSet<T>(this IEnumerable<T> source)
{
return new HashedSet<T>(source.ToHashSet());
}

Rather than the simple conversion of IEnumerable to a HashSet, it is often convenient to convert a property of another object into a HashSet. You could write this as:
var set = myObject.Select(o => o.Name).ToHashSet();
but, my preference would be to use selectors:
var set = myObject.ToHashSet(o => o.Name);
They do the same thing, and the the second is obviously shorter, but I find the idiom fits my brains better (I think of it as being like ToDictionary).
Here's the extension method to use, with support for custom comparers as a bonus.
public static HashSet<TKey> ToHashSet<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> selector,
IEqualityComparer<TKey> comparer = null)
{
return new HashSet<TKey>(source.Select(selector), comparer);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Immutable set in .NET - c#

There is a ReadOnlyCollection, but it's not a hash table. LINQ adds the Union method as an extension.

Related

How to sort a dictionary that contains dictionary?

Ideal c# reduction method for values of the same type, with bitwise approach?

creating a generic collection of key and values in C# 4.0

Does Distinct() method keep original ordering of sequence intact?

How to convert linq results to HashSet or HashedSet

Categories

Resources