Compare two objects using serialization C# - c#

Why it is not a good practice to compare two objects by serializing them and then compare the strings like in the following example?
public class Obj
{
public int Prop1 { get; set; }
public string Prop2 { get; set; }
}
public class Comparator<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
}
public int GetHashCode(T obj)
{
return JsonConvert.SerializeObject(obj).GetHashCode();
}
}
Obj o1 = new Obj { Prop1 = 1, Prop2 = "1" };
Obj o2 = new Obj { Prop1 = 1, Prop2 = "2" };
bool result = new Comparator<Obj>().Equals(o1, o2);
I have tested it and it works, it is generic so it could stand for a great diversity of objects, but what I am asking is which are the downsides of this approach for comparing objects?
I have seen it has been suggested in this question and it received some upvotes but I can't figure it out why this is not considered the best way, if somebody wants to compare just the values of the properties of two objects?
EDIT : I am strictly talking about Json serialize, not XML.
I am asking this because I want to create a simple and generic Comparator for a Unit Test project, so the performance of comparison does not bother me so much, as I know this may be one of the biggest down-sides. Also the typeless problem can be handled using in case of Newtonsoft.Json the TypeNameHandling property set to All.

The primary problem is that it is inefficient
As an example imagine this Equals function
public bool Equals(T x, T y)
{
return x.Prop1 == y.Prop1
&& x.Prop2 == y.Prop2
&& x.Prop3 == y.Prop3
&& x.Prop4 == y.Prop4
&& x.Prop5 == y.Prop5
&& x.Prop6 == y.Prop6;
}
if prop1 are not the same then the other 5 compares never need to be checked, if you did this with JSON you would have to convert the entire object into a JSON string then compare the string every time, this is on top of serialization being an expensive task all on its own.
Then the next problem is serialization is designed for communication e.g. from memory to a file, across a network, etc. If you have leveraged serialization for comparison you can degrade your ability to use it for it normal use, i.e. you can't ignore fields not required for transmission because ignoring them might break your comparer.
Next JSON in specific is Type-less which means than values than are not in anyway shape or form equal may be mistaken for being equal, and in the flipside values that are equal may not compare as equal due to formatting if they serialize to the same value, this is again unsafe and unstable
The only upside to this technique is that is requires little effort for the programmer to implement

You probably going to keep adding a bounty to the question until somebody tells you that it is just fine to do this. So you got it, don't hesitate to take advantage of the NewtonSoft.Json library to keep the code simple. You just need some good arguments to defend your decision if your code is ever reviewed or if somebody else takes over the maintenance of the code.
Some of the objections they may raise, and their counter-arguments:
This is very inefficient code!
It certainly is, particularly GetHashCode() can make your code brutally slow if you ever use the object in a Dictionary or HashSet.
Best counter-argument is to note that efficiency is of little concern in a unit test. The most typical unit test takes longer to get started than to actually execute and whether it takes 1 millisecond or 1 second is not relevant. And a problem you are likely to discover very early.
You are unit-testing a library you did not write!
That is certainly a valid concern, you are in effect testing NewtonSoft.Json's ability to generate a consistent string representation of an object. There is cause to be alarmed about this, in particular floating point values (float and double) are never not a problem. There is also some evidence that the library author is unsure how to do it correctly.
Best counter-argument is that the library is widely used and well maintained, the author has released many updates over the years. Floating point consistency concerns can be reasoned away when you make sure that the exact same program with the exact same runtime environment generates both strings (i.e. don't store it) and you make sure the unit-test is built with optimization disabled.
You are not unit-testing the code that needs to be tested!
Yes, you would only write this code if the class itself provides no way to compare objects. In other words, does not itself override Equals/GetHashCode and does not expose a comparator. So testing for equality in your unit test exercise a feature that the to-be-tested code does not actually support. Something that a unit test should never do, you can't write a bug report when the test fails.
Counter argument is to reason that you need to test for equality to test another feature of the class, like the constructor or property setters. A simple comment in the code is enough to document this.

By serializing your objects to JSON, you are basically changing all of your objects to another data type and so everything that applies to your JSON library will have an impact on your results.
So if there is a tag like [ScriptIgnore] in one of the objects, your code will simply ignore it since it has been omitted from your data.
Also, the string results can be the same for objects that are not the same. like this example.
static void Main(string[] args)
{
Xb x1 = new X1()
{
y1 = 1,
y2 = 2
};
Xb x2 = new X2()
{
y1 = 1,
y2= 2
};
bool result = new Comparator<Xb>().Equals(x1, x2);
}
}
class Xb
{
public int y1 { get; set; }
}
class X1 : Xb
{
public short y2 { get; set; }
}
class X2 : Xb
{
public long y2 { get; set; }
}
So as you see x1 has a different type from x2 and even the data type of the y2 is different for those two, but the json results will be the same.
Other than that, since both x1 and x2 are from type Xb, I could call your comparator without any problems.

I would like to correct the GetHashCode at the beginning.
public class Comparator<T> : IEqualityComparer<T>
{
public bool Equals(T x, T y)
{
return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
}
public int GetHashCode(T obj)
{
return JsonConvert.SerializeObject(obj).GetHashCode();
}
}
Okay, next, we discuss the problem of this method.
First, it just won't work for types with looped linkage.
If you have a property linkage as simple as A -> B -> A, it fails.
Unfortunately, this is very common in lists or map that interlink together.
Worst, there is hardly an efficient generic loop detection mechanism.
Second, comparison with serialization is just inefficient.
JSON needs reflection and lots of type judging before successfully compile its result.
Therefore, your comparer will become a serious bottleneck in any algorithm.
Usually, even if in thousands of records cases, JSON is considered slow enough.
Third, JSON has to go over every property.
It will become a disaster if your object links to any big object.
What if your object links to a big file?
As a result, C# simply leaves the implementation to user.
One has to know his class thoroughly before creating a comparator.
Comparison requires good loop detection, early termination and efficiency consideration.
A generic solution simply does not exist.

These are some of the downsides:
a) Performance will be increasingly bad the deeper your object tree is.
b) new Obj { Prop1 = 1 } Equals
new Obj { Prop1 = "1" } Equals
new Obj { Prop1 = 1.0 }
c) new Obj { Prop1 = 1.0, Prop2 = 2.0 } Not Equals
new Obj { Prop2 = 2.0, Prop1 = 1.0 }

First, I notice that you say "serialize them and then compare the strings." In general, ordinary string comparison will not work for comparing XML or JSON strings, you have to be a little more sophisticated than that. As a counterexample to string comparison, consider the following XML strings:
<abc></abc>
<abc/>
They are clearly not string equal but they definitely "mean" the same thing. While this example might seem contrived, it turns out that there are quite a few cases where string comparison doesn't work. For example, whitespace and indentation are significant in string comparison but may not be significant in XML.
The situation isn't all that much better for JSON. You can do similar counterexamples for that.
{ abc : "def" }
{
abc : "def"
}
Again, clearly these mean the same thing, but they're not string-equal.
Essentially, if you're doing string comparison you're trusting the serializer to always serialize a particular object in exactly the same way (without any added whitespace, etc), which ends up being remarkably fragile, especially given that most libraries do not, to my knowledge, provide any such guarantee. This is particularly problematic if you update the serialization libraries at some point and there's a subtle difference in how they do the serialization; in this case, if you try to compare a saved object that was serialized with the previous version of the library with one that was serialized with the current version then it wouldn't work.
Also, just as a quick note on your code itself, the "==" operator is not the proper way to compare objects. In general, "==" tests for reference equality, not object equality.
One more quick digression on hash algorithms: how reliable they are as a means of equality testing depends on how collision resistant they are. In other words, given two different, non-equal objects, what's the probability that they'll hash to the same value? Conversely, if two objects hash to the same value, what are the odds that they're actually equal? A lot of people take it for granted that their hash algorithms are 100% collision resistant (i.e. two objects will hash to the same value if, and only if, they're equal) but this isn't necessarily true. (A particularly well-known example of this is the MD5 cryptographic hash function, whose relatively poor collision resistance has rendered it unsuitable for further use). For a properly-implemented hash function, in most cases the probability that two objects that hash to the same value are actually equal is sufficiently high to be suitable as a means of equality testing but it's not guaranteed.

Object comparison using serialize and then comparing the strings representations in not effective in the following cases:
When a property of type DateTime exists in the types that need to be compared
public class Obj
{
public DateTime Date { get; set; }
}
Obj o1 = new Obj { Date = DateTime.Now };
Obj o2 = new Obj { Date = DateTime.Now };
bool result = new Comparator<Obj>().Equals(o1, o2);
It will result false even for objects very close created in time, unless they don't share the exactly same property.
For objects that have double or decimal values which need to be compared with an Epsilon to verify if they are eventually very close to each other
public class Obj
{
public double Double { get; set; }
}
Obj o1 = new Obj { Double = 22222222222222.22222222222 };
Obj o2 = new Obj { Double = 22222222222222.22222222221 };
bool result = new Comparator<Obj>().Equals(o1, o2);
This will also return false even the double values are really close to each other, and in the programs which involves calculation, it will become a real problem, because of the loss of precision after multiple divide and multiply operations, and the serialize does not offer the flexibility to handle these cases.
Also considering the above cases, if one wants not to compare a property, it will face the problem of introducing a serialize attribute to the actual class, even if it is not necessary and it will lead to code pollution or problems went it will have to actually use serialization for that type.
Note: These are some of the actual problems of this approach, but I am looking forward to find others.

For unit tests you don`t need write own comparer. :)
Just use modern frameworks. For example try FluentAssertions library
o1.ShouldBeEquivalentTo(o2);

Serialization was made for storing an object or sending it over a pipe (network) that is outside of the current execution context. Not for doing something inside the execution context.
Some serialized values might not be considered equal, which in fact they are : decimal "1.0" and integer "1" for instance.
For sure you can just like you can eat with a shovel but you don't because you might break your tooth!

You can use System.Reflections namespace to get all the properties of the instance like in this answer. With Reflection you can compare not only public properties, or fields (like using Json Serialization), but also some private, protected, etc. to increase the speed of calculation. And of course, it's obvious that you don't have to compare all properties or fields of instance if two objects are different (excluding the example when only the last property or field of object differs).

Related

Can an immutable type change its internal state?

The question is simple. Can a type that can change its internal state without it being observable from the outside be considered immutable?
Simplified example:
public struct Matrix
{
bool determinantEvaluated;
double determinant;
public double Determinant
{
get //asume thread-safe correctness in implementation of the getter
{
if (!determinantEvaluated)
{
determinant = getDeterminant(this);
determinantEvaluated = true;
}
return determinant;
}
}
}
UPDATE: Clarified the thread-safeness issue, as it was causing distraction.
It depends.
If you are documenting for authors of client code or reasoning as an author of client code, then you are concerned with the interface of the component (that is, its externally observable state and behavior) and not with its implementation details (like the internal representation).
In this sense, a type is immutable even if it caches state, even if it initializes lazily, etc - as long as these mutations aren't observable externally. In other words, a type is immutable if it behaves as immutable when used through its public interface (or its other intended use cases, if any).
Of course, this can be tricky to get right (with mutable internal state, you may need to concern yourself with thread safety, serialization/marshaling behavior, etc). But assuming you do get it right (to the extent you need, at least) there's no reason not to consider such a type immutable.
Obviously, from the point of view of a compiler or an optimizer, such a type is typically not considered immutable (unless the compiler is sufficiently intelligent or has some "help" like hints or prior knowledge of some types) and any optimizations that were intended for immutable types may not be applicable, if this is the case.
Yes, immutable can change its state, providing that the changes are
unseen for other components of the software (usually caches). Quite
like quantum physics: an event should have an observer to be an event.
In your case a possible implementation is something like that:
public class Matrix {
...
private Lazy<Double> m_Determinant = new Lazy<Double>(() => {
return ... //TODO: Put actual implementation here
});
public Double Determinant {
get {
return m_Determinant.Value;
}
}
}
Note, that Lazy<Double> m_Determinant has a changing state
m_Determinant.IsValueCreated
which is, however, unobservable.
I'm going to quote Clojure author Rich Hickey here:
If a tree falls in the woods, does it make a sound?
If a pure function mutates some local data in order to produce an immutable return value, is that ok?
It is perfectly reasonable to mutate objects that are expose APIs which are immutable to the outside for performance reasons. The important thing about immutable object is their immutability to the outside. Everything that is encapsulated within them is fair game.
In a way in garbage collected languages like C# all objects have some state because of the GC. As a consumer that should not usually concern you.
I'll stick my neck out...
No, an immutable object cannot change its internal state in C# because observing its memory is an option and thus you can observe the uninitialised state. Proof:
public struct Matrix
{
private bool determinantEvaluated;
private double determinant;
public double Determinant
{
get
{
if (!determinantEvaluated)
{
determinant = 1.0;
determinantEvaluated = true;
}
return determinant;
}
}
}
then...
public class Example
{
public static void Main()
{
var unobserved = new Matrix();
var observed = new Matrix();
Console.WriteLine(observed.Determinant);
IntPtr unobservedPtr = Marshal.AllocHGlobal(Marshal.SizeOf(typeof (Matrix)));
IntPtr observedPtr = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(Matrix)));
byte[] unobservedMemory = new byte[Marshal.SizeOf(typeof (Matrix))];
byte[] observedMemory = new byte[Marshal.SizeOf(typeof (Matrix))];
Marshal.StructureToPtr(unobserved, unobservedPtr, false);
Marshal.StructureToPtr(observed, observedPtr, false);
Marshal.Copy(unobservedPtr, unobservedMemory, 0, Marshal.SizeOf(typeof (Matrix)));
Marshal.Copy(observedPtr, observedMemory, 0, Marshal.SizeOf(typeof (Matrix)));
Marshal.FreeHGlobal(unobservedPtr);
Marshal.FreeHGlobal(observedPtr);
for (int i = 0; i < unobservedMemory.Length; i++)
{
if (unobservedMemory[i] != observedMemory[i])
{
Console.WriteLine("Not the same");
return;
}
}
Console.WriteLine("The same");
}
}
The purpose of specifying a type to be immutable is to establish the following invariant:
If two instances of an immutable type are ever observed to be equal, any publicly-observable reference to one may be replaced with a reference to the other without affecting the behavior of either.
Because .NET provides the ability to compare any two references for equality, it's not possible to achieve perfect equivalence among immutable instances. Nonetheless, the above invariant is still very useful if one regards reference-equality checks as being outside the realm of things for which a class object is responsible.
Note that under this rule, a subclass may define fields beyond those included in an immutable base class, but must not expose them in such a fashion as to violate the above invariant. Further, a class may include mutable fields provided that they never change in any way that affects a class's visible state. Consider something like the hash field in Java's string class. If it's non-zero, the hashCode value of the string is equal to the value stored in the field. If it's zero, the hashCode value of the string is the result of performing certain calculations on the immutable character sequence encapsulated by the string. Storing the result of the aforementioned calculations into the hash field won't affect the hash code of the string; it will merely speed up repeated requests for the value.

Should i define a struct for the sole purpose of one function?

I got a function that is in need of a custom data type, one way to
approach this problem is by defining a struct however this is only for
just one function, wouldn't it be better if i just use a dynamic
object instead?
For example:
public struct myDataType(){
public string name { get; set; }
public string email { get; set; }
public string token { get; set; }
}
public bool doSomething(string name, string email, string token){
myDataType MDT = new myDataType();
MDT.name = name;
MDT.email = email;
MDT.token = token;
//Do something with MDT
return 1;
}
Or
public bool doSomething(string name, string email, string token){
dynamic MDT = new ExpandoObject();
MDT.name = name;
MDT.email = email;
MDT.token = token;
//Do something with MDT
return 1;
}
Note:
While i can define all possible props in the struct, i don't know how many i need to use.
The Example is not real it just shows the 2 possible approaches.
That is not the purpose of dynamic. Dynamic is used when you don't know the type until runtime (and have a good reason to have such a scenario). Usages outside of this just de-value the strongly typed nature of C#, allowing code to compile that could be invalid at runtime.
If you need object A with properties B, C, D, then create that object, even if you are going to use it once. Besides, you will need to use that object when something calls your function and needs to access the properties of the returned object. It's better that those properties are known and strongly typed. You can use a struct instead of a class if you prefer, but make it a strongly typed object.
Edit: The original question was edited to indicate that the function does not return the object. Nonetheless, the above still otherwise holds true - that is, this is not a scenario when you don't know the type until runtime, and therefore it is not the right scenario to use dynamic. Since the usage of the object is short-lived, I would use a struct. See here for in-depth discussion on when struct should be used: When to use struct?
There is no performance impact if you choose any of the two solutions you came up with.
When the compiler meets the dynamic keyword it will do the same, will define a class that contains all the members defined.
for this example:
new { Property1 = "something", Propert2 = "somethingElse" }
compiler will generate something like:
class SomeNameChoosenByCompiler
{
public string Property1 {get; set; }
public string Property2 {get; set; }
}
as you are actually using the object outside of your method i would go with the struct version as it makes the code more readable and easy to understand and maybe scalable in time.
Also, with dynamic you would loose compile-time benefits
You can do it either way.
My personal preference would be to use the strongly typed struct so that if I mistype any of the property names I'll find out when I compile the project. If you use the expandoobject you won't find out until the code runs.
The other thing to consider is that a struct is a value type while an expandoobject is obviously a reference type. This may affect your decision because of the way the two types can be used in the rest of your code. For example, a value type variable cannot be set to null, and they follow different copying semantics.
A variable of a structure type is, in essence, a group of variables stuck together with duct tape. A heap object of a structure type (i.e. a "boxed" struct instance) is processed by the runtime as though it were a class object with a variable of that structure type as its only field; any methods which would operate on the structure as a whole operate on that field, while those which would operate on the fields of the structure operate on its sub-fields.
The ability to binding groups of variables together with duct tape is useful if one will be using the variables as a group; almost all cases where one would want to do that, however, would require that the structure be used in at least two places (e.g. a place it's copied from, and a place it's copied to), though there are cases where all the places might be confined to a single function (e.g. one may have a variables prevState and currentState, each containing a few fields, and may want to be able to take a snapshot of all the variables in currentState and later revert all the variables to their earlier values). Structures can be good for that.
I would suggest that it's often good to have very bare-bones structure definitions. If one has a method which reads through a list and computes the minimum and maximum values according to some passed-in IComparer<T>, having a structure:
struct MinMaxResult<T> { public T Minimum, Maximum; }
could make things clearer than having a more complicated data type which wraps its fields in properties and tries to enforce invariants such as Maximim >= Minimum, etc. The fact that MinMaxResult is a structure with exposed fields makes it clear that given the declaration MinMaxResult mmr;, code shouldn't expect mmr.Minimum to have any meaning beyond "the last value written to mmr.Minimum, or default(T) if nothing was written." Everything of interest is going to be in whatever writes to mmr; the more concise definition of MinMaxResult<T>, the less it will distract from what's actually going on.

How can I compare doubles using a specified tolerance in NUnit?

I am currently developing a C# P/invoke wrapper to a DLL that is part of my product. I have no experience with C# and this is the first significant C# coding I have done. I am acutely aware that I am lacking a lot of knowledge of the finer points and idioms of the language.
My question concerns the unit tests that I am writing for which I am using NUnit. I have a need to compare the values of double[] variables. If I use Assert.AreEqual(...) to do this then the values are compared for exact equality. However, I would like to compare up to a tolerance. There are AreEqual() overloads for scalar real values that admit a delta parameter. However, I have not been able to find an equivalent for arrays. Have I missed something obvious?
At the moment I have solved the problem with the following code:
class Assert : NUnit.Framework.Assert
{
public static void AreEqual(double[] expected, double[] actual, double delta)
{
AreEqual(expected.Length, actual.Length);
for (int i = 0; i < expected.Length; i++)
{
AreEqual(expected[i], actual[i], delta);
}
}
}
Whilst this appears to work I wonder if there is a cleaner solution available. In particular I am concerned that using the same name for my derived class is, not only poor style, but could lead to un-anticipated problems down the road.
I would have like to use extension methods but I understand they are only viable for when there is an instance of the class under extension. Of course, I only ever call static methods on the Assert class.
I'm sorry if this seems a bit nebulous, but my instincts tell me that I'm not doing this the best way and I would like to know how to do it right.
Since the introduction of the fluent assertion syntax in NUnit, the Within() method has been available for this purpose:
double actualValue = 1.989;
double expectedValue = 1.9890;
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(0.00001));
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(1).Ulps);
Assert.That(actualValue, Is.EqualTo(expectedValue).Within(0.1).Percent);
For collections, the default behaviour of Is.EqualTo() is to compare the collections' members individually, with these individual comparisons being modified by Within(). Hence, you can compare two arrays of doubles like so:
var actualDoubles = new double[] {1.0 / 3.0, 0.7, 9.981};
var expectedDoubles = new double[] { 1.1 / 3.3, 0.7, 9.9810};
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(0.00001));
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(1).Ulps);
Assert.That(actualDoubles, Is.EqualTo(expectedDoubles).Within(0.1).Percent);
This will compare each element of actualDoubles to the corresponding element in expectedDoubles using the specified tolerance, and will fail if any are not sufficiently close.
I had a need to create custom assert, in your case there was an alternative provided by the framework. However this didn't work when I wanted to have a completely custom assert. I solved this by adding a new static class calling into nunit.
public static class FooAssert
{
public static void CountEquals(int expected, FooConsumer consumer)
{
int actualCount = 0;
while (consumer.ConsumeItem() != null)
actualCount++;
NUnit.Framework.Assert.AreEqual(expected, actualCount);
}
}
Then in a test
[Test]
public void BarTest()
{
// Arrange
...
// Act
...
// Assert
FooAssert.CountEquals(1, fooConsumer);
}
I know I am a little bit late for the party, it might still be useful for somebody
"Better" is always a matter of taste. In this case, i would say yes. You should make your own assert, without subclassing the nunit assert. Nunit already has multiple Assert classes with different specific assertions. Like CollectionAssert. Otherwise your method is fine.
I think what I would have done is simply define a function somewhere in your test harness
public static bool AreEqual( double[], double[], double delta )
that does the comparison and returns true or false appropriately. In your Test you simply write :
Assert.IsTrue( AreEqual( expected, result, delta ) ) ;

Type-proofing primitive .NET value types via custom structs: Is it worth the effort?

I'm toying with the idea of making primitive .NET value types more type-safe and more "self-documenting" by wrapping them in custom structs. However, I'm wondering if it's actually ever worth the effort in real-world software.
(That "effort" can be seen below: Having to apply the same code pattern again and again. We're declaring structs and so cannot use inheritance to remove code repetition; and since the overloaded operators must be declared static, they have to be defined for each type separately.)
Take this (admittedly trivial) example:
struct Area
{
public static implicit operator Area(double x) { return new Area(x); }
public static implicit operator double(Area area) { return area.x; }
private Area(double x) { this.x = x; }
private readonly double x;
}
struct Length
{
public static implicit operator Length(double x) { return new Length(x); }
public static implicit operator double(Length length) { return length.x; }
private Length(double x) { this.x = x; }
private readonly double x;
}
Both Area and Length are basically a double, but augment it with a specific meaning. If you defined a method such as…
Area CalculateAreaOfRectangleWith(Length width, Length height)
…it would not be possible to directly pass in an Area by accident. So far so good.
BUT: You can easily sidestep this apparently improved type safety simply by casting a Area to double, or by temporarily storing an Area in a double variable, and then passing that into the method where a Length is expected:
Area a = 10.0;
double aWithEvilPowers = a;
… = CalculateAreaOfRectangleWith( (double)a, aWithEvilPowers );
Question: Does anyone here have experience with extensive use of such custom struct types in real-world / production software? If so:
Has the wrapping of primitive value types in custom structs ever directly resulted in less bugs, or in more maintainable code, or given any other major advantage(s)?
Or are the benefits of custom structs too small for them to be used in practice?
P.S.: About 5 years have passed since I asked this question. I'm posting some of my experiences that I've made since then as a separate answer.
I did this in a project a couple of years ago, with some mixed results. In my case, it helped a lot to keep track of different kinds of IDs, and to have compile-time errors when the wrong type of IDs were being used. And I can recall a number of occasions where it prevented actual bugs-in-the-making. So, that was the plus side. On the negative side, it was not very intuitive for other developers -- this kind of approach is not very common, and I think other developers got confused with all these new types springing up. Also, I seem to recall we had some problems with serialization, but I can't remember the details (sorry).
So if you are going to go this route, I would recommend a couple of things:
1) Make sure you talk with the other folks on your team first, explain what you're trying to accomplish and see if you can get "buy-in" from everyone. If people don't understand the value, you're going to be constantly fighting against the mentality of "what's the point of all this extra code"?
2) Consider generate your boilerplate code using a tool like T4. It will make the maintenance of the code much easier. In my case, we had about a dozen of these types and going the code-generation route made changes much easier and much less error prone.
That's my experience. Good luck!
John
This is a logical next step to Hungarian Notation, see an excellent article from Joel here http://www.joelonsoftware.com/articles/Wrong.html
We did something similar in one project/API where esp. other developers needed to use some of our interfaces and it had siginificantly less "false alarm support cases" - because it made really obvious what was allowed/needed... I would suspect this means measurably less bugs though we never did the statistics because the resulting apps were not ours...
In the aprox. 5 years since I've asked this question, I have often toyed with defining struct value types, but rarely done it. Here's some reasons why:
It's not so much that their benefit is too small, but that the cost of defining a struct value type is too high. There's lots of boilerplate code involved:
Override Equals and implement IEquatable<T>, and override GetHashCode too;
Implement operators == and !=;
Possibly implement IComparable<T> and operators <, <=, > and >= if a type supports ordering;
Override ToString and implement conversion methods and type-cast operators feom/to other related types.
This is simply a lot of repetitive work that is often not necessary when using the underlying primitive types directly.
Often, there is no reasonable default value (e.g. for value types such as zip codes, dates, times, etc.). Since one can't prohibit the default constructor, defining such types as struct is a bad idea and defining them as class means some runtime overhead (more work for the GC, more memory dereferencing, etc.)
I haven't actually found that wrapping a primitive type in a semantic struct really offers significant benefits with regard to type safety; IntelliSense / code completion and good choices of variable / parameter names can achieve much of the same benefits as specialized value types. On the other hand, as another answer suggests, custom value types can be unintuitive for some developers, so there's an additional cognitive overhead in using them.
There have been some instances where I ended up wrapping a primitive type in a struct, normally when there are certain operations defined on them (e.g. a DecimalDegrees and Radians type with methods to convert between these).
Defining equality and comparison methods, on the other hand, does not necessarily mean that I'd define a custom value type. I might instead use primitive .NET types and provide well-named implementations of IEqualityComparer<T> and IComparer<T> instead.
I don't often use structs.
One thing you may consider is to make your type require a dimension. Consider this:
public enum length_dim { meters, inches }
public enum area_dim { square_meters, square_inches }
public class Area {
public Area(double a,area_dim dim) { this.area=a; this.dim=dim; }
public Area(Area a) { this.area = a.area; this.dim = a.dim; }
public Area(Length l1, Length l2)
{
Debug.Assert(l1.Dim == l2.Dim);
this.area = l1.Distance * l1.Distance;
switch(l1.Dim) {
case length_dim.meters: this.dim = square_meters;break;
case length_dim.inches: this.dim = square_inches; break;
}
}
private double area;
public double Area { get { return this.area; } }
private area_dim dim;
public area_dim Dim { get { return this.dim; } }
}
public class Length {
public Length(double dist,length_dim dim)
{ this.distance = dist; this.dim = dim; }
private length_dim dim;
public length_dim Dim { get { return this.dim; } }
private double distance;
public double Distance { get { return this.distance; } }
}
Notice that nothing can be created from a double alone. The dimension must be specified. My objects are immutable. Requiring dimension and verifying it would have prevented the Mars Express disaster.
I can't say, from my experience, if this is a good idea or not. It certainly has it's pros and cons. A pro is that you get an extra dose of type safety. Why accept a double for an angle when you can accept a type that has true angle semantics (degrees to/from radians, constrained to degree values of 0-360, etc). Con, not common so could be confusing to some developers.
See this link for a real example of a real commercial product that has several types like you describe:
http://geoframework.codeplex.com/
Types include Angle, Latitude, Longitude, Speed, Distance.

C#/Java: Proper Implementation of CompareTo when Equals tests reference identity

I believe this question applies equally well to C# as to Java, because both require that {c,C}ompareTo be consistent with {e,E}quals:
Suppose I want my equals() method to be the same as a reference check, i.e.:
public bool equals(Object o) {
return this == o;
}
In that case, how do I implement compareTo(Object o) (or its generic equivalent)? Part of it is easy, but I'm not sure about the other part:
public int compareTo(Object o) {
MyClass other = (MyClass)o;
if (this == other) {
return 0;
} else {
int c = foo.CompareTo(other.foo)
if (c == 0) {
// what here?
} else {
return c;
}
}
}
I can't just blindly return 1 or -1, because the solution should adhere to the normal requirements of compareTo. I can check all the instance fields, but if they are all equal, I'd still like compareTo to return a value other than 0. It should be true that a.compareTo(b) == -(b.compareTo(a)), and the ordering should stay consistent as long as the objects' state doesn't change.
I don't care about ordering across invocations of the virtual machine, however. This makes me think that I could use something like memory address, if I could get at it. Then again, maybe that won't work, because the Garbage Collector could decide to move my objects around.
hashCode is another idea, but I'd like something that will be always unique, not just mostly unique.
Any ideas?
First of all, if you are using Java 5 or above, you should implement Comparable<MyClass> rather than the plain old Comparable, therefore your compareTo method should take parameters of type MyClass, notObject:
public int compareTo(MyClass other) {
if (this == other) {
return 0;
} else {
int c = foo.CompareTo(other.foo)
if (c == 0) {
// what here?
} else {
return c;
}
}
}
As of your question, Josh Bloch in Effective Java (Chapter 3, Item 12) says:
The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compare-
To(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception
if and only if y.compareTo(x) throws an exception.)
This means that if c == 0 in the above code, you must return 0.
That in turn means that you can have objects A and B, which are not equal, but their comparison returns 0. What does Mr. Bloch have to say about this?
It is strongly recommended, but not strictly required, that (x.compareTo(y)
== 0) == (x.equals(y)). Generally speaking, any class that implements
the Comparable interface and violates this condition should clearly indicate
this fact. The recommended language is “Note: This class has a natural
ordering that is inconsistent with equals.”
And
A class whose compareTo method imposes an order
that is inconsistent with equals will still work, but sorted collections containing
elements of the class may not obey the general contract of the appropriate collection
interfaces (Collection, Set, or Map). This is because the general contracts
for these interfaces are defined in terms of the equals method, but sorted collections
use the equality test imposed by compareTo in place of equals. It is not a
catastrophe if this happens, but it’s something to be aware of.
Update: So IMHO with your current class, you can not make compareTo consistent with equals. If you really need to have this, the only way I see would be to introduce a new member, which would give a strict natural ordering to your class. Then in case all the meaningful fields of the two objects compare to 0, you could still decide the order of the two based on their special order values.
This extra member may be an instance counter, or a creation timestamp. Or, you could try using a UUID.
In Java or C#, generally speaking, there is no fixed ordering of objects. Instances can be moved around by the garbage collector while executing your compareTo, or the sort operation that's using your compareTo.
As you stated, hash codes are generally not unique, so they're not usable (two different instances with the same hash code bring you back to the original question). And the Java Object.toString implementation which many people believe to surface an object id (MyObject#33c0d9d), is nothing more than the object's class name followed by the hash code. As far as I know, neither the JVM nor the CLR have a notion of an instance id.
If you really want a consistent ordering of your classes, you could try using an incrementing number for each new instance you create. Mind you, incrementing this counter must be thread safe, so it's going to be relatively expensive (in C# you could use Interlocked.Increment).
Two objects don't need to be reference equal to be in the same equivalence class. In my opinion, it should be perfectly acceptable for two different objects to be the same for a comparison, but not reference equal. It seems perfectly natural to me, for example, that if you hydrated two different objects from the same row in the database, that they would be the same for comparison purposes, but not reference equal.
I'd actually be more inclined to modify the behavior of equals to reflect how they are compared rather than the other way around. For most purposes that I can think of this would be more natural.
The generic equivalent is easier to deal with in my opinion, depends what your external requirements are, this is a IComparable<MyClass> example:
public int CompareTo(MyClass other) {
if (other == null) return 1;
if (this == other) {
return 0;
} else {
return foo.CompareTo(other.foo);
}
}
If the classes are equal or if foo is equal, that's the end of the comparison, unless there's something secondary to sort on, in that case add it as the return if foo.CompareTo(other.foo) == 0
If your classes have an ID or something, then compare on that as secondary, otherwise don't worry about it...the collection they're stored it and it's order in arriving at these classes to compare is what's going to determine the final order in the case of equal objects or equal object.foo values.

Categories

Resources