Using C# HashSet to solve problems where equal is not equal - c#

I'm basing this on performance characteristics I've recently found out about Dictionary, so I'm using Dictionary<type, bool> where the bool is ignored but supposedly I could use HashSet instead.
For example:
Dictionary<bounds, bool> overlap;
class bounds
{
public float top_left_x, top_left_y, width, height;
public bool equal(bounds other)
{
return upper_left_x + width > other.upper_left_x &&
upper_left_x < other.upper_left_x + other.width &&
upper_left_y + height > other.upper_left_y &&
upper_left_y < other.upper_left_y + other.height;
}
public ... GetHashCode()
{
...;
}
}
Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.
I'm presuming that if a value can be looked up from a key in O(1) time then so can a key from itself.
So I could presumably put thousands of bounds into overlap and do this:
overlap.ContainsKey(new bounds(...));
To find out in O(1) time if a given bound overlaps any others from the collection.
I'd also like to know what happens if I change the (x, y) position of a bound, presumably it's like removing and then adding it into the set again performance wise, very expensive?
What do I put into the GetHashCode function?
goal
If this works then I'm after using this sort of mechanism to find out what other bounds a given bound overlaps.
Very few bounds move in this system and no new ones are added after the collection has been populated. Newly added bounds need to be able to overlap old ones.
conclusion
See the feedback below for more details.
In summary it's not possible to achieve the O(1) performance because, unlike the default equals, a check for overlapping is not transitive.
An interval tree however is a good solution.

The equality relation is completely the wrong relation to use here because equality is required to be an equivalence relation. That is, it must be reflexive -- A == A for any A. It must be symmetric -- A == B implies that B == A. And it must be transitive -- if A == B and B == C then A == C.
You are proposing a violation of the transitive property; "overlaps" is not a transitive relation, therefore "overlaps" is not an equivalence relation, and therefore you cannot define equality as overlapping.
Rather than trying to do this dangerous thing, solve the real problem. Your goal is apparently to take a set of intervals, and then quickly determine whether a given interval overlaps any of those intervals. The data structure you want is called an interval tree; it is specifically optimized to solve exactly that problem, so use it. Under no circumstances should you attempt to use a hash set as an interval tree. Use the right tool for the job:
http://wikipedia.org/wiki/Interval_tree

Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.
I'm assuming this means you will have a scenario where A.Equals(B) is true, B.Equals(C) is true, but A.Equals(C) is false. In other words, your Equals is not transitive.
That is breaking the rules of Equals(), and as a result Dictionary will not work for you. The rule of Equals/GetHashCode is (from http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx):
If two objects compare as equal, the GetHashCode method for each object must return the same value.
If your Equals is not transitive, then you can't possibly write a valid GetHashCode.

If you use the derived class approach I mentioned above, you need the following:
public class Bounds
{
public Point position;
public Point size; // I know the width and height don't really compose
// a point, but this is just for demonstration
public override int GetHashCode(){...}
}
public class OverlappingBounds : Bounds
{
public override bool Equals(object other)
{
// your implementation here
}
}
// Usage:
if (_bounds.ContainsKey(new OverlappingBounds(...))){...}
but since the GetHashCode() method needs to return always the same value, the runtime complexity will most likely be at O(n) instead of O(1).

You can't use a Dictionary or HashSet to check if bounds overlap. To be able to use a dictionary (or hashset), you need an Equals() and a GetHashCode() method that meet the following properties:
The Equals() method is an equivalence relation
a.Equals(b) must imply a.GetHashCode() == b.GetHashCode()
You can't meet either of these requirements, so you have to use another datastructure: An Interval tree.

You can not guarantee O(1) performance on dictionary where you customize hashcode calculation. If I put inside GetHashCode() method some WebService request which should control for me the equality of 2 provided items, it's clear that the time can never be O(1) as expected. Ok, this is kind of "edge case", but just to give an idea.
By doing in a way you think to do (assuming that this is even possible), imo, you negate the benefits provided by Dictionary<K,V>, so the constant key recovery time also on big collections.
It need to be measured on reasonable amount of objects you have, but I would first try to use
List<T> like an object holder, and make something like this:
var bounds = new List<Bound> {.... initialization... }
Bound providedBound = //something. Some data filled in it.
var overlappedany = bounds.Any<Bound>(b=>return b.Equals(providedBound));

Related

How to define a method that makes assumptions about object state?

In our geometry library, we have a Polygon class, containing (amongst many others, of course) a method which checks if this polygon contains another polygon:
public bool Contains(Polygon other) {
//complex containment checking code goes here
}
Other important things to note are that the polygon is mutable, and that there is no way to be notified of when its shape changes.
Now, in one of our projects, we have a validation function, which checks for a huge number of polygons whether they are all contained within a single outer boundary.
Polygon boundary = ...;
foreach (var poly in _myMassiveListOfPolygons)
if (!boundary.Contains(poly))
//error handling code goes here
Profiling has determined this to be quite a bottleneck. Fortunately there is a way to vastly speed up the calculation. We know that the boundary polygon is always convex, so instead of using the general Contains method, we can write a specialized method that assumes the outer polygon is convex.
Initially I added something like this to the client code:
Func<Polygon, bool> containsFunc;
// Keep both options, in case my assumption about being always convex is wrong
if (boundary.IsConvex())
{
containsFunc = p => { ... }; //efficient code for convex boundary goes here
}
else
{
containsFunc = p => boundary.Contains(p);
}
foreach (var poly in _myMassiveListOfPolygons)
if (!containsFunc(poly))
//error handling code goes here
Joy all around! Performance has increased tenfold!
However, it doesn't seem right that the code for this alternative Contains method is located in the client project, where it can't be reused by others.
So I tried to move it to the Polygon class itself.
public bool Contains(Polygon other) {
if (this.IsConvex())
return ConvexContains(other);
else
return GenericContains(other);
}
private bool GenericContains(Polygon other) { ...}
private bool ConvexContains(Polygon other) { ...}
With this method, the client code returns back to the original. Unfortunately, there is one big difference compared to the previous code: Before there was one call to boundary.IsConvex() to select the correct function to use, where now this method is called for every invocation of Contains! Although this is still faster than using the generic Contains method, most of the performance improvements are lost again (not to mention performance decreases for concave polygons).
A third possibility would be to have two public methods to check containment, where one of them assumes that the polygon is convex (without doing the check itself, of course, otherwise we're back to the previous overhead problem). This doesn't seem like a good idea, since calling it by accident with a concave polygon could give unexpected results, making for hard to find bugs.
So finally, my question is, is there any good way to deal with this situation? Where should we put the alternative Contains method, and how should we call it?
Edit
I like the idea of caching whether or not a polygon is convex. But I cannot change the interface nor the behavior of the class, which is problematic in a case such as this:
//relevant part of the polygon interface
public List<Point> Points { get; set; }
//client code
var poly = new Polygon();
var list = new List<Point>
{
new Point(0,0),
new Point(10,0),
new Point(0,10)
};
poly.Points = list;
list.Add(new Point(1, 1));
This last line of code is executed on a regular List<Point>, there is nothing I can do about that. However, there would now be 2 requirements:
The polygon must now contain the added point. This is to ensure existing code doesn't break. This means that we cannot copy the Points into our own, observable list.
We must get notified of this change to the list of points (because our polygon has turned from convex to concave). This means that we cannot wrap the list by our own, observable list, because only changes made through the wrapper would cause a notification.
One option is to create another concept in the geometry library, say FixedBoundary, which could encapsulate that logic. It would be immutable so that you could safely cache the convexity. An example:
public class FixedBoundary
{
private Polygon boundary;
private bool isConvex;
public FixedBoundary(Polygon boundary)
{
// deep clone so we don't have to worry about the boundary being modified later.
this.boundary = boundary.Clone();
this.isConvex = this.boundary.IsConvex();
}
public bool Contains(Polygon p)
{
if (isConvex)
{
// efficient convex logic here
}
else
{
return this.boundary.Contains(p);
}
}
}
This of course adds some conceptual weight to the geometry library. Another option would be to add an ImmutablePolygon (and by extension ImmutablePoint) which could do the same, however conversions may be cumbersome and affect performance as well.
You could replicate what you've done already with the Func<T, bool> delegate.. internal to the Polygon.. perhaps like this?
private Func<Polygon, bool> _containsFunc;
// ...
public bool Contains(Polygon other) {
if (_containsFunc == null) {
if (this.IsConvex())
_containsFunc = ConvexContains;
else
_containsFunc = GenericContains;
}
return _containsFunc(other);
}
Each call to Contains after the first will not call IsConvex. Unless I have misunderstood you.. that sounds like what you're after.
Okay...you guys really shot yourselves in the foot by using List in your public interface. Here is how you can try to solve it (there is a small non-zero chance that this will incorrectly cache, but its a 1 in (1 <<32 ) chance.
As you have noted, we could cache the polygon's Convexness...but the problem then becomes a case of Cache invalidation.
Without being able invalid cache on object change. You would have to check for cache coherence on each call. This is where hashing comes in.
By default the hash for list is quite useless, you want to use a this solution for getting a usable hash.
So what you want to do is to add a nullable<int> _cachedHashCode on your polygon. When you call Polygon.IsConvex, you hash your List<Point>, compare with your _cachedHashCode, and if they don't equal, recompute your IsConvex.

Is Nullable<T>.GetHashCode() a poor hash code function?

The implementation of Nullable<T>.GetHashCode() is as follows:
public override int GetHashCode()
{
if (!this.HasValue)
{
return 0;
}
return this.value.GetHashCode();
}
If however the underlying value also generates a hash code of 0 (e.g. a bool set to false or an int32 set to 0), then we have two commonly occurring different object states with the same hash code. It seems to me that a better implementation would have been something like.
public override int GetHashCode()
{
if (!this.HasValue)
{
return 0xD523648A; // E.g. some arbitrary 32 bit int with a good mix of set and
// unset bits (also probably a prime number).
}
return this.value.GetHashCode();
}
Yes, you do have a point. It is always possible to write a better GetHashCode() implementation if you know up front what data you are going to store. Not a luxury that a library writer ever has available. But yes, if you have a lot of bool? that are either false or !HasValue then the default implementation is going to hurt. Same for enums and ints, zero is a common value.
Your argument is academic however, changing the implementation costs minus ten thousand points and you can't do it yourself. Best you can do is submit the suggestion, the proper channel is the user-voice site. Getting traction on this is going to be difficult, good luck.
Let's first note that this question is just about performance. The hash code is not required to be unique or collision resistant for correctness. It is helpful for performance though.
Actually, this is the main value proposition of a hash table: Practically evenly distributed hash codes lead to O(1) behavior.
So what hash code constant is most likely to lead to the best possible performance profile in real applications?
Certainly not 0 because 0 is a common hash code: 0.GetHashCode() == 0. That goes for other types as well. 0 is the worst candidate because it tends to occur so often.
So how to avoid collisions? My proposal:
static readonly int nullableDefaultHashCode = GetRandomInt32();
public override int GetHashCode()
{
if (!this.HasValue)
return nullableDefaultHashCode;
else
return this.value.GetHashCode();
}
Evenly distributed, unlikely to collide and no stylistic problem of choosing an arbitrary constant.
Note, that GetRandomInt32 could be implemented as return 0xD523648A;. It would still be more useful than return 0;. But it is probably best to query a cheap source of pseudo-random numbers.
In the end, a Nullable<T> without value has to return a hashcode, and that hashcode should be a constant.
Returning an arbitrary constant may look more safe or appropriate, perhaps even more so when viewed within the specific case of Nullable<int>, but in the end it's just that: a hash.
And within the entire set that Nullable<T> can cover (which is infinite), zero is not a better hashcode than any other value.
I don't understand the concern here - poor performance in what situation?
Why would you could consider a hash function as poor based on its result for one value.
I could see that it would be a problem if many different values of a Type hash to the same result. But the fact that null hashes to the same value as 0 seems insignificant.
As far as I know the most common use of a .NET hash function is for a Hashtable, HashSet or Dictionary key, and the fact that zero and null happen to be in the same bucket will have an insignificant effect on overall performance.

Dictionary performance improvement

I'm trying to improve on some code that was written a while back. the function is quite important to the core functionality of the system so I am cautious about a drastic overhaul.
I am using a dictionary to hold objects
Dictionary<Node, int> dConnections
The object Node is in itself a complex object containing many attributes and some lists.
This Dictionary could get quite large holding around 100 or more entries.
Currently the dictionary is being checked if it contains a node like
dConnections.ContainsKey(Node)
So I am presuming that (to check if this node is in the dictionary) the dictionary will have to check if the whole node and its attributes match a node in the dictionary (it will keep on iterating through the dictionary until it finds a match) and this will have a major impact on performance?
Would I be better off not using an object in the dictionary and rather use object id.
The .NET dictionary is an hashtable in the Inside. It means that if Node doesn't overrides the GetHashCode and Equals methods, when you call ContainsKey, it will match against:
Disclaimer: It's a summary. Things are a little more complicated. Please don't call me name because I oversimplified.
a partition of the hashcode of the ref address of the Node object. The number of partitions dépends upon the number of buckets of the hashtable (depending on the total number of keys in the dictionnary)
the exact ref address if more than one Node is in the same bucket.
This algorithm is very efficient. When you say that you have 100 or more entries in the dictionary, it's not "a lot". It's a few.
It means also that the content of the Node object has nothing to do with the way a ContainsKey will match. It will match against the exact same reference, and only against this reference.
If you implement GetHashCode and Equals yourself, be aware that these method return values shouldn't change when the instance property change (be immutable). Otherwise you could well get keys in the wrong bucket, and therefore completely unreachable (without enumerating the whole dictionary).
it will keep on iterating through the dictionary until it finds a match
No, dictionaries don't find matches by iterating all nodes; the hash-code is obtained first, and is used to limit the candidates to one, maybe a few (depending on how good your hashing method is, and the bucket-size)
So I am presuming that (to check if this node is in the dictionary) the dictionary will have to check if the whole node and its attributes match a node in the dictionary
No, for each candidate, it first checks the hash-code, which is intended to be a short-cut to detect non-equality vs possible-equality very quickly
So the key here is: your Node's hashing method, aka GetHashCode. If this is complex, then another trick is to cache it the first time you need it, i.e.
int cachedHashCode;
public override int GetHashCode() {
if(cachedHashCode == 0) {
cachedHashCode = /* some complex code here */
if(cachedHashCode == 0) {
cachedHashCode = -45; // why not... just something non-zero
}
}
return cachedHashCode;
}
Note that it does still use Equals too, as the final "are they the same", so you obviously want Equals to be as fast as possible too - but Equals will be called relatively rarely.

Preventing double hash operation when trying to update value in a of type Dictionary<IComparable, int>

I am working on software for scientific research that deals heavily with chemical formulas. I keep track of the contents of a chemical formula using an internal Dictionary<Isotope, int> where Isotope is an object like "Carbon-13", "Nitrogen-14", and the int represents the number of those isotopes in the chemical formula. So the formula C2H3NO would exist like this:
{"C12", 2
"H1", 3
"N14", 1
"O16", 1}
This is all fine and dandy, but when I want to add two chemical formulas together, I end up having to calculate the hash function of Isotope twice to update a value, see follow code example.
public class ChemicalFormula {
internal Dictionary<Isotope, int> _isotopes = new Dictionary<Isotope, int>();
public void Add(Isotope isotope, int count)
{
if (count != 0)
{
int curValue = 0;
if (_isotopes.TryGetValue(isotope, out curValue))
{
int newValue = curValue + count;
if (newValue == 0)
{
_isotopes.Remove(isotope);
}
else
{
_isotopes[isotope] = newValue;
}
}
else
{
_isotopes.Add(isotope, count);
}
_isDirty = true;
}
}
}
While this may not seem like it would be a slow down, it is when we are adding billions of chemical formulas together, this method is consistently the slowest part of the program (>45% of the running time). I am dealing with large chemical formulas like "H5921C3759N1023O1201S21" that are consistently being added to by smaller chemical formulas.
My question is, is there a better data structure for storing data like this? I have tried creating a simple IsotopeCount object that contains a int so I can access the value in a reference-type (as opposed to value-type) to avoid the double hash function. However, this didn't seem beneficial.
EDIT
Isotope is immutable and shouldn't change during the lifetime of the program so I should be able to cache the hashcode.
I have linked to the source code so you can see the classes more in depth rather than me copy and paste them here.
I second the opinion that Isotope should be made immutable with precalculated hash. That would make everything much simpler.
(in fact, functionally-oriented programming is better suited for calculations of such sort, and it deals with immutable objects)
I have tried creating a simple IsotopeCount object that contains a int so I can access the value in a reference-type (as opposed to value-type) to avoid the double hash function. However, this didn't seem beneficial.
Well it would stop the double hashing... but obviously it's then worse in terms of space. What performance difference did you notice?
Another option you should strongly consider if you're doing this a lot is caching the hash within the Isotope class, assuming it's immutable. (If it's not, then using it as a dictionary key is at least somewhat worrying.)
If you're likely to use most Isotope values as dictionary keys (or candidates) then it's probably worth computing the hash during initialization. Otherwise, pick a particularly unlikely hash value (in an ideal world, that would be any value) and use that as the "uncached" value, and compute it lazily.
If you've got 45% of the running time in GetHashCode, have you looked at optimizing that? Is it actually GetHashCode, or Equals which is the problem? (You talk about "hashing" but I suspect you mean "hash lookup in general".)
If you could post the relevant bits of the Isotope type, we may be able to help more.
EDIT: Another option to consider if you're using .NET 4 would be ConcurrentDictionary, with its AddOrUpdate method. You'd use it like this:
public void Add(Isotope isotope, int count)
{
// I prefer early exit to lots of nesting :)
if (count == 0)
{
return 0;
}
int newCount = _isotopes.AddOrUpdate(isotope, count,
(key, oldCount) => oldCount + count);
if (newCount == 0)
{
_isotopes.Remove(isotope);
}
_isDirty = true;
}
Do you actually require random access to Isotope count by type or are you using the dictionary as a means for associating a key with a value?
I would guess the latter.
My suggestion to you is not to work with a dictionary but with a sorted array (or List) of IsotopeTuples, something like:
class IsotopeTuple{
Isotope i;
int count;
}
sorted by the name of the isotope.
Why the sorting?
Because then, when you want to "add" two isotopes together, you can do this in linear time by traversing both arrays (hope this is clear, I can elaborate if needed). No hash computation required, just super fast comparisons of order.
This is a classic approach when dealing with vector multiplications where the dimensions are words.
Used widely in text mining.
the tradeoff is of course that the construction of the initial vector is (n)log(n), but I doubt if you will feel the impact.
Another solution that you could think of if you had a limited number of Isotopes and no memory problems:
public struct Formula
{
public int C12;
public int H1;
public int N14;
public int O16;
}
I am guessing you're looking at organic chemistry, so you may not have to deal with that many isotopes, and if the lookup is the issue, this one will be pretty fast...

How can I implement an infinite set class?

I'm designing a class library for discrete mathematics, and I can't think of a way to implement an infinite set.
What I have so far is: I have an abstract base class, Set, which implements the interface ISet. For finite sets, I derive a class FiniteSet, which implements each set method. I can then use it like this:
FiniteSet<int> set1 = new FiniteSet<int>(1, 2, 3);
FiniteSet<int> set2 = new FiniteSet<int>(3, 4, 5);
Console.WriteLine(set1); //{1, 2, 3}
Console.WriteLine(set2); //{3, 4, 5}
set1.UnionWith(set2);
Console.WriteLine(set1); //{1, 2, 3, 4, 5}
Now I want to represent an infinite set. I had the idea of deriving another abstract class from set, InfiniteSet, and then developers using the library would have to derive from InfiniteSet to implement their own classes. I'd supply commonly used sets, such as N, Z, Q, and R.
But I have no idea how I'd implement methods like Subset and GetEnumerator - I'm even starting to think it's impossible. How do you enumerate an infinite set in a practical way, so that you can intersect/union it with another infinite set? How can you check, in code, that N is a subset of R? And as for the issue of cardinality.. Well, that's probably a separate question.
All this leads me to the conclusion that my idea for implementing an infinite set is probably the wrong way to go. I'd very much appreciate your input :).
Edit: Just to be clear, I'd also like to represent uncountably infinite sets.
Edit2: I think it's important to remember that the ultimate goal is to implement ISet, meaning that any solution has to provide (as it should) ways to implement all of ISet's methods, the most problematic of which are the enumeration methods and the IsSubsetOf method.
It is not possible to fully implement ISet<T> for uncountably infinite sets.
Here's a proof (courtesy of Bertrand Russell):
Suppose you have created a class MySet<T> that can represent an uncountably infinite set. Now let's consider some MySet<object> objects.
We label a particular MySet<object>, call it instance, "abnormal" if:
instance.Contains(instance) returns true.
Similarly, we would label instance as "normal" if:
instance.Contains(instance) returns false.
Note that this distinction is well-defined for all instance.
Now consider an instance of MySet<MySet<object>> called paradox.
We define paradox as the MySet<MySet<object>> which contains all possible normal instances of MySet<object>.
What should paradox.Contains(paradox) return?
If it returns true, then paradox is abnormal and should have returned false when called on itself.
If it returns false then paradox is normal, and should have returned true when called on itself.
There is no way to implement Contains to resolve this paradox, so there is no way to fully implement ISet<T> for all possible uncountable sets.
Now, if you restrict the cardinality of MySet<T> to be equal to or less than the cardinality of the continuum (|R|), then you will be able to get around this paradox.
Even then, you will not be able to implement Contains or similar methods because doing so would be equivalent to solving the halting problem. (Remember that the set of all C# programs has cardinality equal to |Z| < |R|.)
EDIT
To be more thorough, here's is an explanation of my assertion that "doing so would be equivalent to solving the halting problem."
Consider the MySet<string> that consists of all C# programs (as strings) which halt in a finite amount of time (suppose they halt for any input, to be precise). Call it paradox2. The set is *recursively enumerable", meaning that you could implement GetEnumerator on it (not easily, but it is possible). That also means that it is well defined. However, this set is not "decidable" because its complement is not recursively enumerable.
Define a C# program as follows:
using ... //Everything;
public static class Decider {
private MySet<string> _haltingSet = CreateHaltingSet();
static void Main(string [] args) {
Console.WriteLine(_haltingSet.Contains(args[0]));
}
}
Compile the above program, and pass it as input to itself. What happens?
If your Contains method is properly implemented, then you've solved the halting problem. However, we know that that's not possible, so we can only conclude that it is not possible to properly implement Contains, even for countably infinite sets.
You might be able to restrict your MySet<T> class to work for all decidable sets. However, then you will still run into all sorts of problems with your function never halting in a finite amount of time.
As an example, let's pretend we have an arbitrary precision real type called Real, and let's let nonHalting be an instance of MySet<Real> that includes all the non-trivial zeros of the Riemann Zeta function (this is a decidable set). If you can properly implement IsProperSubsetOf on nonHalting to return in a finite amount of time when passed the set of all complex numbers with real part 1/2 (also a decidable set), then you'll win a Millennium Prize.
You're going to have to generalize what you mean by Set.
If you are going to have an infinite set, you won't be able to get a meaningful Enumeration over it, so you won't define set operations with operations on enumerations.
If you define a Set<f> in terms of a bool IsMember(f obj) method, it can be used for infinite sets.
You define a union or intersection of two sets as the logical and or or of the IsMember method of the two sets.
represent uncountably infinite sets
Lets examine this statement in the context of how it is done in practice. For example, when asking weather a set A is a subset of set Z (positive integers) the subject is not Z. Every number in Z is not analyzed. What is analyzed is the set in question, A. Because there is no way to compare Ak (A sub k where k is a number between 1 and |A|) to every value of Z (infinite), every value of A must be compared to the properties which constitute Z. If every value in A satisfies the properties of Z then A is a subset of Z.
how can you represent in code R union N
Same process as above. The properties of R are "any real number" - in code this could be "any double that doesn't throw an exception" (Obviously Math.Pow(-1,.5) will give issues and is not in R as a result). The properties of N are "any integer" - in code this could be any number where Math.Floor != Math.Ceiling. The union of these two is the union of their properties. Any number which adheres to the properties of R or N - in code this would be any number which does not throw an exception to create or which Math.Floor != Math.Ceiling.
Summary
To represent uncountable infinite sets, use their properties not their values.
edits
N ⊆ R ?
Lets go back to the properties idea since that is the theme I would pursue. Is N a subset of R? For N to be a subset of R then the properties of N must satisfy all of the properties of R. The list of properties will need to be accurate. To represent the numeric value of infinity, I would suggest using a class which contains a nullable int number and a normal int sign.
public class Infinite
{
public int? Number { get; set; }
public int Sign { get; set; }
}
Something along those lines. Number.Value == null implies infinite. The Sign can be used to show negative (-1), +- (0), or positive (1).
Back to the N subset of R situation. Aside from the properties listed earlier, N would also have Infinite.Number == null and Infinite.Sign == 0 as a bounds for its properties. As would R. So, N would be able to satisfy the boundary property. Next would be the properties defined above. I actually got stuck here. I am not sure how to prove in code that every number which .Floor == .Ceiling will not cause an exception. However, since there are only 9 of these types of super sets (Rational, Irrational, Integer, Real, Complex, Imaginary, Transcendental, Algebraic, Natural) you could specially define their interactions on the infinite scale and then use a simpler implementation for finite comparisons.
What exactly are you going to do with it.
You can't enumerate it.
I thinking I be treating it as a descendant of the universal set.
I think I'd start from the other end
Define a Universal set where Ismember is always true
Then a descendant where IsMember is true if it's a representation of a natural number
{1,2,3,4} is a further restriction of N
A thought anyway
It's possible with lots of limitations, just like symbolic expression handling.
Here is a little example:
class IntSet
{
int m_first;
int m_delta;
public IntSet(int first, int delta)
{
m_first = first;
m_delta = delta;
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
sb.Append('[');
sb.Append(m_first);
sb.Append(',');
sb.Append(m_first + m_delta);
sb.Append(',');
sb.Append("...");
sb.Append(']');
return sb.ToString();
}
public IEnumerable<int> GetNumbers()
{
yield return m_first;
int next = m_first;
while (true)
{
next += m_delta;
yield return next;
}
}
}

Categories

Resources