How can I implement an infinite set class? - c#

I'm designing a class library for discrete mathematics, and I can't think of a way to implement an infinite set.
What I have so far is: I have an abstract base class, Set, which implements the interface ISet. For finite sets, I derive a class FiniteSet, which implements each set method. I can then use it like this:
FiniteSet<int> set1 = new FiniteSet<int>(1, 2, 3);
FiniteSet<int> set2 = new FiniteSet<int>(3, 4, 5);
Console.WriteLine(set1); //{1, 2, 3}
Console.WriteLine(set2); //{3, 4, 5}
set1.UnionWith(set2);
Console.WriteLine(set1); //{1, 2, 3, 4, 5}
Now I want to represent an infinite set. I had the idea of deriving another abstract class from set, InfiniteSet, and then developers using the library would have to derive from InfiniteSet to implement their own classes. I'd supply commonly used sets, such as N, Z, Q, and R.
But I have no idea how I'd implement methods like Subset and GetEnumerator - I'm even starting to think it's impossible. How do you enumerate an infinite set in a practical way, so that you can intersect/union it with another infinite set? How can you check, in code, that N is a subset of R? And as for the issue of cardinality.. Well, that's probably a separate question.
All this leads me to the conclusion that my idea for implementing an infinite set is probably the wrong way to go. I'd very much appreciate your input :).
Edit: Just to be clear, I'd also like to represent uncountably infinite sets.
Edit2: I think it's important to remember that the ultimate goal is to implement ISet, meaning that any solution has to provide (as it should) ways to implement all of ISet's methods, the most problematic of which are the enumeration methods and the IsSubsetOf method.

It is not possible to fully implement ISet<T> for uncountably infinite sets.
Here's a proof (courtesy of Bertrand Russell):
Suppose you have created a class MySet<T> that can represent an uncountably infinite set. Now let's consider some MySet<object> objects.
We label a particular MySet<object>, call it instance, "abnormal" if:
instance.Contains(instance) returns true.
Similarly, we would label instance as "normal" if:
instance.Contains(instance) returns false.
Note that this distinction is well-defined for all instance.
Now consider an instance of MySet<MySet<object>> called paradox.
We define paradox as the MySet<MySet<object>> which contains all possible normal instances of MySet<object>.
What should paradox.Contains(paradox) return?
If it returns true, then paradox is abnormal and should have returned false when called on itself.
If it returns false then paradox is normal, and should have returned true when called on itself.
There is no way to implement Contains to resolve this paradox, so there is no way to fully implement ISet<T> for all possible uncountable sets.
Now, if you restrict the cardinality of MySet<T> to be equal to or less than the cardinality of the continuum (|R|), then you will be able to get around this paradox.
Even then, you will not be able to implement Contains or similar methods because doing so would be equivalent to solving the halting problem. (Remember that the set of all C# programs has cardinality equal to |Z| < |R|.)
EDIT
To be more thorough, here's is an explanation of my assertion that "doing so would be equivalent to solving the halting problem."
Consider the MySet<string> that consists of all C# programs (as strings) which halt in a finite amount of time (suppose they halt for any input, to be precise). Call it paradox2. The set is *recursively enumerable", meaning that you could implement GetEnumerator on it (not easily, but it is possible). That also means that it is well defined. However, this set is not "decidable" because its complement is not recursively enumerable.
Define a C# program as follows:
using ... //Everything;
public static class Decider {
private MySet<string> _haltingSet = CreateHaltingSet();
static void Main(string [] args) {
Console.WriteLine(_haltingSet.Contains(args[0]));
}
}
Compile the above program, and pass it as input to itself. What happens?
If your Contains method is properly implemented, then you've solved the halting problem. However, we know that that's not possible, so we can only conclude that it is not possible to properly implement Contains, even for countably infinite sets.
You might be able to restrict your MySet<T> class to work for all decidable sets. However, then you will still run into all sorts of problems with your function never halting in a finite amount of time.
As an example, let's pretend we have an arbitrary precision real type called Real, and let's let nonHalting be an instance of MySet<Real> that includes all the non-trivial zeros of the Riemann Zeta function (this is a decidable set). If you can properly implement IsProperSubsetOf on nonHalting to return in a finite amount of time when passed the set of all complex numbers with real part 1/2 (also a decidable set), then you'll win a Millennium Prize.

You're going to have to generalize what you mean by Set.
If you are going to have an infinite set, you won't be able to get a meaningful Enumeration over it, so you won't define set operations with operations on enumerations.
If you define a Set<f> in terms of a bool IsMember(f obj) method, it can be used for infinite sets.
You define a union or intersection of two sets as the logical and or or of the IsMember method of the two sets.

represent uncountably infinite sets
Lets examine this statement in the context of how it is done in practice. For example, when asking weather a set A is a subset of set Z (positive integers) the subject is not Z. Every number in Z is not analyzed. What is analyzed is the set in question, A. Because there is no way to compare Ak (A sub k where k is a number between 1 and |A|) to every value of Z (infinite), every value of A must be compared to the properties which constitute Z. If every value in A satisfies the properties of Z then A is a subset of Z.
how can you represent in code R union N
Same process as above. The properties of R are "any real number" - in code this could be "any double that doesn't throw an exception" (Obviously Math.Pow(-1,.5) will give issues and is not in R as a result). The properties of N are "any integer" - in code this could be any number where Math.Floor != Math.Ceiling. The union of these two is the union of their properties. Any number which adheres to the properties of R or N - in code this would be any number which does not throw an exception to create or which Math.Floor != Math.Ceiling.
Summary
To represent uncountable infinite sets, use their properties not their values.
edits
N ⊆ R ?
Lets go back to the properties idea since that is the theme I would pursue. Is N a subset of R? For N to be a subset of R then the properties of N must satisfy all of the properties of R. The list of properties will need to be accurate. To represent the numeric value of infinity, I would suggest using a class which contains a nullable int number and a normal int sign.
public class Infinite
{
public int? Number { get; set; }
public int Sign { get; set; }
}
Something along those lines. Number.Value == null implies infinite. The Sign can be used to show negative (-1), +- (0), or positive (1).
Back to the N subset of R situation. Aside from the properties listed earlier, N would also have Infinite.Number == null and Infinite.Sign == 0 as a bounds for its properties. As would R. So, N would be able to satisfy the boundary property. Next would be the properties defined above. I actually got stuck here. I am not sure how to prove in code that every number which .Floor == .Ceiling will not cause an exception. However, since there are only 9 of these types of super sets (Rational, Irrational, Integer, Real, Complex, Imaginary, Transcendental, Algebraic, Natural) you could specially define their interactions on the infinite scale and then use a simpler implementation for finite comparisons.

What exactly are you going to do with it.
You can't enumerate it.
I thinking I be treating it as a descendant of the universal set.
I think I'd start from the other end
Define a Universal set where Ismember is always true
Then a descendant where IsMember is true if it's a representation of a natural number
{1,2,3,4} is a further restriction of N
A thought anyway

It's possible with lots of limitations, just like symbolic expression handling.
Here is a little example:
class IntSet
{
int m_first;
int m_delta;
public IntSet(int first, int delta)
{
m_first = first;
m_delta = delta;
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
sb.Append('[');
sb.Append(m_first);
sb.Append(',');
sb.Append(m_first + m_delta);
sb.Append(',');
sb.Append("...");
sb.Append(']');
return sb.ToString();
}
public IEnumerable<int> GetNumbers()
{
yield return m_first;
int next = m_first;
while (true)
{
next += m_delta;
yield return next;
}
}
}

Related

C# assignment in constructor to member ... doesn't change it

I have a simple class intended to store scaled integral values
using member variables "scaled_value" (long) with a "scale_factor".
I have a constructor that fills a new class instance with a decimal
value (although I think the value type is irrelevant).
Assignment to the "scaled_value" slot appears... to not happen.
I've inserted an explicit assignment of the constant 1 to it.
The Debug.Assert below fails... and scaled_value is zero.
On the assertion break in the immediate window I can inspect/set using assignment/inspect "scale_factor"; it changes as I set it.
I can inspect "scaled_value". It is always zero. I can type an
assignment to it which the immediate window executes, but its value
doesn't change.
I'm using Visual Studio 2017 with C# 2017.
What is magic about this slot?
public class ScaledLong : Base // handles scaled-by-power-of-ten long numbers
// intended to support equivalent of fast decimal arithmetic while hiding scale factors from user
{
public long scaled_value; // up to log10_MaxLong digits of decimal precision
public sbyte scale_factor; // power of ten representing location of decimal point range -21..+21. Set by constructor AND NEVER CHANGED.
public byte byte_size; // holds size of value in underlying memory array
string format_string;
<other constructors with same arguments except last value type>
public ScaledLong(sbyte sf, byte size, string format, decimal initial_value)
{
scale_factor = sf;
byte_size = size;
format_string = format;
decimal temp;
sbyte exponent;
{ // rip exponent out of decimal value leaving behind an integer;
_decimal_structure.value = initial_value;
exponent = (sbyte)_decimal_structure.Exponent;
_decimal_structure.Exponent = 0; // now decimal value is integral
temp = _decimal_structure.value;
}
sbyte sfDelta = (sbyte)(sf - exponent);
if (sfDelta >= 0)
{ // sfDelta > 0
this.scaled_value = 1;
Debug.Assert(scaled_value == 1);
scaled_value = (long)Math.Truncate(temp * DecimalTenToPower[sfDelta]);
}
else
{
temp = Math.Truncate(temp / DecimalHalfTenToPower[-sfDelta]);
temp += (temp % 2); /// this can overflow for value at very top of range, not worth fixing; note: this works for both + and- numbers (?)
scaled_value = (long)(temp / 2); // final result
}
}
The biggest puzzles often have the stupidest foundations. This one is a lesson in unintended side effects.
I found this by thinking about, wondering how in earth a member can get modified in unexpected ways. I found the solution before I read #mjwills comment, but he was definitely sniffing at the right thing.
What I left out (of course!) was that I had just coded a ToString() method for the class... that wasn't debugged. Why did I leave it out? Because it obviously can't affect anything so it can't be part of the problem.
Bzzzzt! it used the member variable as a scratchpad and zeroed it (there's the side effect); that was obviously unintended.
When this means is that when code the just runs, ToString() isn't called and the member variable DOES get modified correctly. (I even had unit tests for the "Set" routine checked all that and they were working).
But, when you are debugging.... the debugger can (and did in this case) show local variables. To do that, it will apparently call ToString() to get a nice displayable value. So the act of single stepping caused ToSTring() to get called, and its buggy scratch variable assignment zeroed out the slot after each step call.
So it wasn't a setter that bit me. It was arguably a getter. (Where is FORTRAN's PURE keyword when you need it?)
Einstein hated spooky actions at a distance. Programmers hate spooky side effects at a distance.
One wonders a bit at the idea of the debugger calling ToString() on a class, whose constructor hasn't finished. What assertions about the state of the class can ToString trust, given the constructor isn't done? I think the MS debugger should be fixed. With that, I would have spent my time debugging ToString instead of chasing this.
Thanks for putting up with my question. It got me to the answer.
If you still have a copy of that old/buggy code it would be interesting to try to build it under VS 2019 and Rider (hopefully the latest, 2022.1.1 at this point) with ReSharper (built in) allowed to do the picky scan and with a .ruleset allowed to bitch about just about anything (just for the 1st build - you'll turn off a lot but you need it to scream in order to see what to turn off). And with .NET 5.0 or 6.0
The reason I mention is that I remember some MS bragging about doing dataflow analysis to some degree in 2019 and I did see Rider complaining about some "unsafe assignments". If the old code is long lost - never mind.
CTOR-wise, if CTOR hasn't finished yet, we all know that the object "doesn't exist" yet and has invalid state, but to circumvent that, C# uses default values for everything. When you see code with constant assignments at the point of definition of data members that look trivial and pointless - the reason for that is that a lot of people do remember C++ and don't trust implicit defaults - just in case :-)
There is a 2-phase/round initialization sequence with 2 CTOR-s and implicit initializations in-between. Not widely documented (so that people with weak hearts don't use it :-) but completely deterministic and thread-safe (hidden fuses everywhere). Just for the sake of it's stability you never-ever want to have a call to any method before the 2 round is done (plain CTOR done still doesn't mean fully constructed object and any method invocation from the outside may trigger the 2nd round prematurely).
1st (plain) CTOR can be used in implicit initializations before the 2nd runs => you can control the (implicit) ordering, just have to be careful and step through it in debugger.
Oh and .ToString normally shouldn't be defined at all - on purpose :-) It's de-facto intrinsic => compiler can take it's liberties with it. Plus, if you define it, pretty soon you'll be obliged to support (and process) format specifiers.
I used to define ToJson (before big libs came to fore) to provide, let's say a controllable printable (which can also go over the wire and is 10-100 times faster than deserialization). These days VS debugger has a collection of "visualizers" and an option to tell debugger to use it or not (when it's off then it will jerk ToString's chain if it sees it.
Also, it's good to have dotPeek (or actual Reflector, owned by Redgate these days) with "find source code" turned off. Then you see the real generated code which is sometimes glorious (String is intrinsic and compiler goes a few extra miles to optimize its operations) and sometimes ugly (async/await - total faker, inefficient and flat out dangerous - how do you say "deadlock" in C# :-) - not kidding) but you need to to be able to see the final code or you are driving blind.

Push Item to the end of an array

No, I can't use generic Collections. What I am trying to do is pretty simple actually. In php I would do something like this
$foo = [];
$foo[] = 1;
What I have in C# is this
var foo = new int [10];
// yeah that's pretty much it
Now I can do something like foo[foo.length - 1] = 1 but that obviously wont work. Another option is foo[foo.Count(x => x.HasValue)] = 1 along with a nullable int during declaration. But there has to be a simpler way around this trivial task.
This is homework and I don't want to explain to my teacher (and possibly the entire class) what foo[foo.Count(x => x.HasValue)] = 1 is and why it works etc.
The simplest way is to create a new class that holds the index of the inserted item:
public class PushPopIntArray
{
private int[] _vals = new int[10];
private int _nextIndex = 0;
public void Push(int val)
{
if (_nextIndex >= _vals.Length)
throw new InvalidOperationException("No more values left to push");
_vals[_nextIndex] = val;
_nextIndex++;
}
public int Pop()
{
if (_nextIndex <= 0)
throw new InvalidOperationException("No more values left to pop");
_nextIndex--;
return _vals[_nextIndex];
}
}
You could add overloads to get the entire array, or to index directly into it if you wanted. You could also add overloads or constructors to create different sized arrays, etc.
In C#, arrays cannot be resized dynamically. You can use Array.Resize (but this will probably be bad for performance) or substitute for ArrayList type instead.
But there has to be a simpler way around this trivial task.
Nope. Not all languages do everything as easy as each other, this is why Collections were invented. C# <> python <> php <> java. Pick whichever suits you better, but equivalent effort isn't always the case when moving from one language to another.
foo[foo.Length] won't work because foo.Length index is outside the array.
Last item is at index foo.Length - 1
After that an array is a fixed size structure if you expect it to work the same as in php you're just plainly wrong
Originally I wrote this as a comment, but I think it contains enough important points to warrant writing it as an answer.
You seem to be under the impression that C# is an awkward language because you stubbornly insist on using an array while having the requirement that you should "push items onto the end", as evidenced by this comment:
Isn't pushing items into the array kind of the entire purpose of the data structure?
To answer that: no, the purpose of the array data structure is to have a contiguous block of pre-allocated memory to mimic the original array structure in C(++) that you can easily index and perform pointer arithmetic on.
If you want a data structure that supports certain operations, such as pushing elements onto the end, consider a System.Collections.Generic.List<T>, or, if you insist on avoiding generics, a System.Collections.List. There are specializations that specify the underlying storage structure (such as ArrayList) but in general the whole point of the C# library is that you don't want to concern yourself with such details: the List<T> class has certain guarantees on its operations (e.g. insertion is O(n), retrieval is O(1) -- just like an array) and whether there is an array or some linked list that actually holds the data is irrelevant and is in fact dynamically decided based on the size and use case of the list at runtime.
Don't try to compare PHP and C# by comparing PHP arrays with C# arrays - they have different programming paradigms and the way to solve a problem in one does not necessarily carry over to the other.
To answer the question as written, I see two options then:
Use arrays the awkward way. Either create an array of Nullable<int>s and accept some boxing / unboxing and unpleasant LINQ statements for insertion; or keep an additional counter (preferably wrapped up in a class together with the array) to keep track of the last assigned element.
Use a proper data structure with appropriate guarantees on the operations that matter, such as List<T> which is effectively the (much better, optimised) built-in version of the second option above.
I understand that the latter option is not feasible for you because of the constraints imposed by your teacher, but then do not be surprised that things are harder than the canonical way in another language, if you are not allowed to use the canonical way in this language.
Afterthought:
A hybrid alternative that just came to mind, is using a List for storage and then just calling .ToArray on it. In your insert method, just Add to the list and return the new array.

Preventing double hash operation when trying to update value in a of type Dictionary<IComparable, int>

I am working on software for scientific research that deals heavily with chemical formulas. I keep track of the contents of a chemical formula using an internal Dictionary<Isotope, int> where Isotope is an object like "Carbon-13", "Nitrogen-14", and the int represents the number of those isotopes in the chemical formula. So the formula C2H3NO would exist like this:
{"C12", 2
"H1", 3
"N14", 1
"O16", 1}
This is all fine and dandy, but when I want to add two chemical formulas together, I end up having to calculate the hash function of Isotope twice to update a value, see follow code example.
public class ChemicalFormula {
internal Dictionary<Isotope, int> _isotopes = new Dictionary<Isotope, int>();
public void Add(Isotope isotope, int count)
{
if (count != 0)
{
int curValue = 0;
if (_isotopes.TryGetValue(isotope, out curValue))
{
int newValue = curValue + count;
if (newValue == 0)
{
_isotopes.Remove(isotope);
}
else
{
_isotopes[isotope] = newValue;
}
}
else
{
_isotopes.Add(isotope, count);
}
_isDirty = true;
}
}
}
While this may not seem like it would be a slow down, it is when we are adding billions of chemical formulas together, this method is consistently the slowest part of the program (>45% of the running time). I am dealing with large chemical formulas like "H5921C3759N1023O1201S21" that are consistently being added to by smaller chemical formulas.
My question is, is there a better data structure for storing data like this? I have tried creating a simple IsotopeCount object that contains a int so I can access the value in a reference-type (as opposed to value-type) to avoid the double hash function. However, this didn't seem beneficial.
EDIT
Isotope is immutable and shouldn't change during the lifetime of the program so I should be able to cache the hashcode.
I have linked to the source code so you can see the classes more in depth rather than me copy and paste them here.
I second the opinion that Isotope should be made immutable with precalculated hash. That would make everything much simpler.
(in fact, functionally-oriented programming is better suited for calculations of such sort, and it deals with immutable objects)
I have tried creating a simple IsotopeCount object that contains a int so I can access the value in a reference-type (as opposed to value-type) to avoid the double hash function. However, this didn't seem beneficial.
Well it would stop the double hashing... but obviously it's then worse in terms of space. What performance difference did you notice?
Another option you should strongly consider if you're doing this a lot is caching the hash within the Isotope class, assuming it's immutable. (If it's not, then using it as a dictionary key is at least somewhat worrying.)
If you're likely to use most Isotope values as dictionary keys (or candidates) then it's probably worth computing the hash during initialization. Otherwise, pick a particularly unlikely hash value (in an ideal world, that would be any value) and use that as the "uncached" value, and compute it lazily.
If you've got 45% of the running time in GetHashCode, have you looked at optimizing that? Is it actually GetHashCode, or Equals which is the problem? (You talk about "hashing" but I suspect you mean "hash lookup in general".)
If you could post the relevant bits of the Isotope type, we may be able to help more.
EDIT: Another option to consider if you're using .NET 4 would be ConcurrentDictionary, with its AddOrUpdate method. You'd use it like this:
public void Add(Isotope isotope, int count)
{
// I prefer early exit to lots of nesting :)
if (count == 0)
{
return 0;
}
int newCount = _isotopes.AddOrUpdate(isotope, count,
(key, oldCount) => oldCount + count);
if (newCount == 0)
{
_isotopes.Remove(isotope);
}
_isDirty = true;
}
Do you actually require random access to Isotope count by type or are you using the dictionary as a means for associating a key with a value?
I would guess the latter.
My suggestion to you is not to work with a dictionary but with a sorted array (or List) of IsotopeTuples, something like:
class IsotopeTuple{
Isotope i;
int count;
}
sorted by the name of the isotope.
Why the sorting?
Because then, when you want to "add" two isotopes together, you can do this in linear time by traversing both arrays (hope this is clear, I can elaborate if needed). No hash computation required, just super fast comparisons of order.
This is a classic approach when dealing with vector multiplications where the dimensions are words.
Used widely in text mining.
the tradeoff is of course that the construction of the initial vector is (n)log(n), but I doubt if you will feel the impact.
Another solution that you could think of if you had a limited number of Isotopes and no memory problems:
public struct Formula
{
public int C12;
public int H1;
public int N14;
public int O16;
}
I am guessing you're looking at organic chemistry, so you may not have to deal with that many isotopes, and if the lookup is the issue, this one will be pretty fast...

Using C# HashSet to solve problems where equal is not equal

I'm basing this on performance characteristics I've recently found out about Dictionary, so I'm using Dictionary<type, bool> where the bool is ignored but supposedly I could use HashSet instead.
For example:
Dictionary<bounds, bool> overlap;
class bounds
{
public float top_left_x, top_left_y, width, height;
public bool equal(bounds other)
{
return upper_left_x + width > other.upper_left_x &&
upper_left_x < other.upper_left_x + other.width &&
upper_left_y + height > other.upper_left_y &&
upper_left_y < other.upper_left_y + other.height;
}
public ... GetHashCode()
{
...;
}
}
Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.
I'm presuming that if a value can be looked up from a key in O(1) time then so can a key from itself.
So I could presumably put thousands of bounds into overlap and do this:
overlap.ContainsKey(new bounds(...));
To find out in O(1) time if a given bound overlaps any others from the collection.
I'd also like to know what happens if I change the (x, y) position of a bound, presumably it's like removing and then adding it into the set again performance wise, very expensive?
What do I put into the GetHashCode function?
goal
If this works then I'm after using this sort of mechanism to find out what other bounds a given bound overlaps.
Very few bounds move in this system and no new ones are added after the collection has been populated. Newly added bounds need to be able to overlap old ones.
conclusion
See the feedback below for more details.
In summary it's not possible to achieve the O(1) performance because, unlike the default equals, a check for overlapping is not transitive.
An interval tree however is a good solution.
The equality relation is completely the wrong relation to use here because equality is required to be an equivalence relation. That is, it must be reflexive -- A == A for any A. It must be symmetric -- A == B implies that B == A. And it must be transitive -- if A == B and B == C then A == C.
You are proposing a violation of the transitive property; "overlaps" is not a transitive relation, therefore "overlaps" is not an equivalence relation, and therefore you cannot define equality as overlapping.
Rather than trying to do this dangerous thing, solve the real problem. Your goal is apparently to take a set of intervals, and then quickly determine whether a given interval overlaps any of those intervals. The data structure you want is called an interval tree; it is specifically optimized to solve exactly that problem, so use it. Under no circumstances should you attempt to use a hash set as an interval tree. Use the right tool for the job:
http://wikipedia.org/wiki/Interval_tree
Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.
I'm assuming this means you will have a scenario where A.Equals(B) is true, B.Equals(C) is true, but A.Equals(C) is false. In other words, your Equals is not transitive.
That is breaking the rules of Equals(), and as a result Dictionary will not work for you. The rule of Equals/GetHashCode is (from http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx):
If two objects compare as equal, the GetHashCode method for each object must return the same value.
If your Equals is not transitive, then you can't possibly write a valid GetHashCode.
If you use the derived class approach I mentioned above, you need the following:
public class Bounds
{
public Point position;
public Point size; // I know the width and height don't really compose
// a point, but this is just for demonstration
public override int GetHashCode(){...}
}
public class OverlappingBounds : Bounds
{
public override bool Equals(object other)
{
// your implementation here
}
}
// Usage:
if (_bounds.ContainsKey(new OverlappingBounds(...))){...}
but since the GetHashCode() method needs to return always the same value, the runtime complexity will most likely be at O(n) instead of O(1).
You can't use a Dictionary or HashSet to check if bounds overlap. To be able to use a dictionary (or hashset), you need an Equals() and a GetHashCode() method that meet the following properties:
The Equals() method is an equivalence relation
a.Equals(b) must imply a.GetHashCode() == b.GetHashCode()
You can't meet either of these requirements, so you have to use another datastructure: An Interval tree.
You can not guarantee O(1) performance on dictionary where you customize hashcode calculation. If I put inside GetHashCode() method some WebService request which should control for me the equality of 2 provided items, it's clear that the time can never be O(1) as expected. Ok, this is kind of "edge case", but just to give an idea.
By doing in a way you think to do (assuming that this is even possible), imo, you negate the benefits provided by Dictionary<K,V>, so the constant key recovery time also on big collections.
It need to be measured on reasonable amount of objects you have, but I would first try to use
List<T> like an object holder, and make something like this:
var bounds = new List<Bound> {.... initialization... }
Bound providedBound = //something. Some data filled in it.
var overlappedany = bounds.Any<Bound>(b=>return b.Equals(providedBound));

A set of students have just a name at first. From their 1st exam on they will have both name and score. How to design this is a simple way?

I have a set of Individuals at first:
class Individual {
double characteristics[];
abstract double CalculateValue();
...
}
Each individual has a set of characteristics. Those characteristics will be implemented by the user of my library through inheritance:
class MyIndividualABC : Individual {
public MyIndividualABC() {
characteristics = new double[2];
characteristics[0] = 123;
characteristics[1] = 456;
}
public override double CalculateValue() {
return characteristics[0] / characteristics[1]; //for example
}
}
I will then take them and give to each one of them a score (based on their value). I will call this doing an iteration.
class EvaluatedIndividual {
double value;
double score; //score was calculated based on value
}
From iteration2 and beyond, I will be always having at hand objects of type EvaluatedIndividual. But the first time, I will have to lead with objects of type Individual.
I wouldn't like to have to treat the first iteration in a different way than the other ones. How should I approach this?
An analogy would be to have a class of students. Before the first exam, you just know their names, and from the 1st exam on you will have both their names and the average of the scores of their exams up to that moment.
I have devised 3 different ways to handle this:
I will have a Start() method on my program and then an Iteration(). The Start() takes Individual[] while the Iteration() takes EvaluatedIndividual[]. Hacky, but will work wonders. Will use this if nothing better comes up.
EvaluatedIndividual inherits from Individual. This looks clean, but I am not too sure about it because I'd like Individual to be an interface and my idea would be to allow the client of my library/framework to inherit from it and define a set of methods/fields on Individual as he wants (Individual could have multiple values that'd be taken into consideration for calculating value, for instance) . If I follow this approach, I'd have to make him also implement a class from EvaluatedIndividual.
Individual is formed by value and a both evaluation and wasEvaluated. At first wasEvaluated was false, so any attempt to gather value throws up an Exception. I don't particulary like this, but I am not sure why.
How'd you approach this? This is probably a pretty common pattern, but I am afraid I don't know any GoF pattern that matches this scenario :(
Thanks
I'm all for simple; I'd just have one class here. In C# maybe use Nullable<T>:
double? Score {get;set;}
In Java (which obviously lacks Nullable<T>), maybe just expose the score along with an isEvaluated flag (or a count of the number of exams evaluated) - if this is false (or 0) you just ignore the score.
I would have one class with a special value for score that denotes that the individual hasn't been evaluated:
class Individual {
private double value;
private double score = -1; // score was calculated based on value,
// -1 means not evaluated
public boolean isEvaluated() { return score != -1; }
}
IMHO inheritance is an overkill here.
Hey, if this isn't some kind of classroom exercise, don't make things over complicated. GoF patterns aren't for everything you know.
class Individual {
double score;
ScoreState scoreState;
}
I've found whenever I need one state, I need 2, and then I shortly need 2+n, so may as well make an enum, this allows for situations like "Pending Review" and other such considerations, which may be handy in the future, and it prevents against an excess of flags, which can be confusing.

Categories

Resources