yesterday I spent some time trying to find a bug. Long story short, finally I realized that it was because of this constructor:
public Triangle(List<Vertex> vertices) {
this._values = vertices;
}
I tried to initialize an object with a list of values and the object just took a reference to my object instead of getting the values from list. If I don't abandon the list that I passed as a parameter and use it later for something else like initializing something else with the same values or if I decide to clear it and fill with new values, I obviously destroy the state of my Triangle object without knowing it.
My first reaction was to "fix the bug" in the constructor but then I started thinking if it's really the way it should be. What's the good practice that covers things like that? In general, what should I think about constructors/init methods that take a list of values? Should they leave it intact? Am I allowed to reuse the list and whose fault is it when it leads to an error?
I mean, I obviously can do something like that:
var triangle = new Triangle(new List<Vertex>(vertices));
but shouldn't it be done by the creators of the Triangle class already?
I would like to know some guidelines on that. Thanks.
Yes, the receiving class (Triangle) should make a copy, unless the design is to intentionally share the List.
Sharing can be useful but is the exception. I don't think a Triangle wants to share its List of vertices with something else.
Note that it could still be sharing the vertices (elements).
Personally I agree with Henk; you should create a copy.
/// <summary>
/// Initialises a new instance of the <see cref="Triangle"/> class that
/// contains elements copied from the specified collection.
/// </summary>
/// <param name="vertices">
/// The collection of vertices whose elements are to be copied.
/// </param>
public Triangle(IEnumerable<Vertex> vertices)
{
this.Vertices = new List<Vertex>(vertices);
}
Whatever you choose just make sure you document it so consumers know what behaviour to expect.
Therefore consumers know that they can safely call new Triangle(vertices).
C# is a pass-by-value language, but since a list is a reference type, it passes its reference by value. As you stated, this means you are passing a shared reference of the list to the constructor of your class. Modifications made anywhere in the code will affect the same list.
It depends on the desired behavior of your class as to what is the appropriate action. If you want to make a deep copy, the easiest way is to just allocate a new list in the constructor and pass in the IEnumerable reference to the list's constructor.
If you want to share the reference, it is a completely valid solution, just make sure you document your class (or name your class) appropriately.
Passing a List object to the constructor would be considered poor design in this case. Perhaps a better solution would be to use a method
class Triangle
{
List<Vertex> Vertices = new List<Vertex>(); // The triangle owns the vertex collection...
public void SetVertices(IEnumerable<Vertex> vertices)
{
this.Vertices.Clear();
this.Vertices.AddRange(vertices);
}
}
I'd say that this is a documentation issue. The documentation, even if it's just the intellisense docs, should say whether the class is initialized using the values from the given list, or if it will use the given list directly. Given any mutable reference type, this is a valid question and should be documented.
Lacking proper documentation, I'd say it's up to you, the consumer of the class, to protect yourself against undocumented behaviors. You have two choices:
Find out for yourself what documentation should have told you. You can use either Reflector or simple experimentation to determine what the code does with the mutable object you pass it.
Protect yourself against the class's behavior, whatever it may be. If a class takes a mutable object, don't reuse that object. This way, even if the class's behavior changes later, you're secure.
In your specific case, I don't think that the Triangle class is wrong. It's constructor could have taken an IEnumerable<Vertex>1 and initialized a member List<Vertex> with those values, but instead, it's designer chose to take a List<Vertex> directly and use that. That could have been a performance-based decision.
1 To be complete, if a bit pedantic, I should mention that even if it took an IEnumerable<Vertex>, you could still run into this same issue. The class could still store and reuse a reference to this object, and therefore be sensitive to changes later made to the list. In this case, however, I would consider the Triangle class to be broken. Convention states, with few exceptions, that a method or constructor that takes an IEnumerable will use it once and then discard it.
What you need is a Clone or deep copy of the List.
Refer this answer for cloning a list
And this for more about deep copies, in general
Related
I am currently struggling to understand something i just saw somewhere.
Lets say I have two classes :
class MyFirstCLass{
public int membVar1;
private int membVar2;
public string membVar3;
private string membVar4;
public MyFirstClass(){
}
}
and :
class MySecondClass{
private MyFirstClass firstClassObject = new MyFirstClass();
public MyFirstClass FirstClassObject{
get{
return firstClassObject;
}
}
}
If i do something like this :
var secondClassObject = new MySecondClass(){
FirstClassObject = {membVar1 = 42, membVar3 = "foo"}
};
secondClass is an instanciation of MySecondClass, and does have one private member variable of type MyFirstClass wich has a readOnly property. However, i am able to change the state of membVar1 and membVar2. Isn't there any encapsulation problem ?
Best regards,
Al_th
The fact that the FirstClassObject property on MySecondClass has no setter does not mean that the object returned from the getter becomes immutable. Since it has public fields, these fields are mutable. Therefore it is perfectly legal to say secondClassObject.FirstClassObject.membVar1 = 42. The absence of the setter only means that you cannot replace the object reference stored in the firstClassObject field with a reference to a different object.
Please note: You are not changing the value of MySecondClass.FirstClassObject. You are simply changing the values inside that property.
Compare the following two snippets. The first is legal, the second is not as it tries to assign a new value to the FirstClassObject property:
// legal:
var secondClassObject = new MySecondClass(){
FirstClassObject = {membVar1 = 42, membVar3 = "foo"} }
// won't compile:
// Property or indexer 'FirstClassObject' cannot be assigned to -- it is read only
var secondClassObject = new MySecondClass(){
FirstClassObject = new MyFirstClass {membVar1 = 42, membVar3 = "foo"} }
Basically, your code is just a very fancy way of writing this:
var secondClassObject = new MySecondClass();
secondClassObject.FirstClassObject.membVar1 = 42;
secondClassObject.FirstClassObject.membVar3 = "foo";
And that's how I would write it. It is explicit and understandable.
Neither a storage location of type MyFirstCLass, nor the value returned by a a property of type MyFirstCLass, contains fields membVar1, membVar2, etc. The storage location or property instead contains information sufficient to either identify an instance of MyFirstCLass or indicate that it is "null". In some languages or frameworks, there exist reference types which identify an object but only allow certain operations to be performed on it, but Java and .NET both use Promiscuous Object References: if an object allows outside code that holds a reference to do something with it, any outside code that gets a reference will be able to do that.
If a class is using a mutable object to encapsulate its own state, and wishes to allow the outside world to see that state but not allow the outside world to tamper with it, it must not return the object directly to the outside code but instead give the outside code something else. Possibilities include:
Expose all the aspects of state encompassed by the object individually (e.g. have a membVar1 property which returns the value of the encapsulated object's membVar1). This can avoid confusion, but provides a caller with no way to handle the properties as a group.
Return a new instance of a read-only wrapper which holds a reference to the private object, and has members that forward read requests (but not write requests) to those members. The returned object will serve as a read-only "view", but outside code will have no nice way to identify whether two such objects are views of the same underlying object.
Have a field of a read-only-wrapper type which is initialized in the constructor, and have a property return that. If each object will only have one read-only wrapper associated with it, two wrapper references will view the same wrapped object only if they identify the same wrapper.
Create an immutable copy of the underlying data, perhaps by creating a new mutable copy and returning a new read-only wrapper to it. This will give the caller a "snapshot" of the data, rather than a live "view".
Create a new mutable copy of the underlying data, and return that. This has the disadvantage that a caller who tries to change the underlying data by changing the copy will be allowed to change the copy without any warnings, but the operation won't work. All of the arguments for why mutable structs are "evil" apply doubly here: code which receives an exposed-field structure should expect that changes to the received structure won't affect the source from which it came, but code which receives a mutable class object has no way of knowing that. Properties should not behave this way; such behavior is generally only appropriate for methods which make clear their intention (e.g. FirstClassObjectAsNewMyFirstClass();
Require that the caller pass in a mutable object of a type that can accept the underlying data, and copy the data into that. This gives the caller the data in a mutable form (which in some cases may be easier to work with) but at the same time avoids any confusion about who "owns" the object. As an added bonus, if the caller will be making many queries, the caller may reuse the same mutable object for all of them, thus avoiding unnecessary object allocations.
Encapsulate the data within a structure, and have a property return the structure. Some people may balk at such usage, but it's a useful convention in cases where a caller may want to piecewise-modify the data. This approach only really works if the data in question is limited to a fixed set of discrete values (such as the coordinates and dimensions of a rectangle), but has the advantage that if the caller understands what a .NET structure is (as all .NET programmers should) the semantics are inherently obvious.
Of these choices, only the last two make clear via the type system what semantics the caller should expect. Accepting a mutable object from the caller offers clear semantics, but makes usage awkward. Returning an exposed-field structure offers clear semantics but only if the data consists of a fixed set of discrete values. Returning a mutable copy of the data is sometimes useful, but is only appropriate if the method name makes clear what it is doing. The other choices generally leave ambiguous the question of whether the data represents a snapshot or a live "view".
From this Answer, I came to know that KeyValuePair are immutables.
I browsed through the docs, but could not find any information regarding immutable behavior.
I was wondering how to determine if a type is immutable or not?
I don't think there's a standard way to do this, since there is no official concept of immutability in C#. The only way I can think of is looking at certain things, indicating a higher probability:
1) All properties of the type have a private set
2) All fields are const/readonly or private
3) There are no methods with obvious/known side effects
4) Also, being a struct generally is a good indication (if it is BCL type or by someone with guidelines for this)
Something like an ImmutabeAttribute would be nice. There are some thoughts here (somewhere down in the comments), but I haven't seen one in "real life" yet.
The first indication would be that the documentation for the property in the overview says "Gets the key in the key/value pair."
The second more definite indication would be in the description of the property itself:
"This property is read/only."
I don't think you can find "proof" of immutability by just looking at the docs, but there are several strong indicators:
It's a struct (why does this matter?)
It has no settable public properties (both are read-only)
It has no obvious mutator methods
For definitive proof I recommend downloading the BCL's reference source from Microsoft or using an IL decompiler to show you how a type would look like in code.
A KeyValuePair<T1,T2> is a struct which, absent Reflection, can only be mutated outside its constructor by copying the contents of another KeyValuePair<T1,T2> which holds the desired values. Note that the statement:
MyKeyValuePair = new KeyValuePair(1,2);
like all similar constructor invocations on structures, actually works by creating a new temporary instance of KeyValuePair<int,int> (happens before the constructor itself executes), setting the field values of that instance (done by the constructor), copying all public and private fields of that new temporary instance to MyKeyValuePair, and then discarding the temporary instance.
Consider the following code:
static KeyValuePair MyKeyValuePair; // Field in some class
// Thread1
MyKeyValuePair = new KeyValuePair(1,1);
// ***
MyKeyValuePair = new KeyValuePair(2,2);
// Thread2
st = MyKeyValuePair.ToString();
Because MyKeyValuePair is precisely four bytes in length, the second statement in Thread1 will update both fields simultaneously. Despite that, if the second statement in Thread1 executes between Thread2's evaluation of MyKeyValuePair.Key.ToString() and MyKeyValuePair.Value.ToString(), the second ToString() will act upon the new mutated value of the structure, even though the first already-completed ToString()operated upon the value before the mutation.
All non-trivial structs, regardless of how they are declared, have the same immutability rules for their fields: code which can change a struct can change its fields; code which cannot change a struct cannot change its fields. Some structs may force one to go through hoops to change one of their fields, but designing struct types to be "immutable" is neither necessary nor sufficient to ensure the immutability of instances. There are a few reasonable uses of "immutable" struct types, but such use cases if anything require more care than is necessary for structs with exposed public fields.
I am wondering how immutability is defined? If the values aren't exposed as public, so can't be modified, then it's enough?
Can the values be modified inside the type, not by the customer of the type?
Or can one only set them inside a constructor? If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?
How can I guarantee that the type is 100% immutable?
If the values aren't exposed as public, so can't be modified, then it's enough?
No, because you need read access.
Can the values be modified inside the type, not by the customer of the type?
No, because that's still mutation.
Or can one only set them inside a constructor?
Ding ding ding! With the additional point that immutable types often have methods that construct and return new instances, and also often have extra constructors marked internal specifically for use by those methods.
How can I guarantee that the type is 100% immutable?
In .Net it's tricky to get a guarantee like this, because you can use reflection to modify (mutate) private members.
The previous posters have already stated that you should assign values to your fields in the constructor and then keep your hands off them. But that is sometimes easier said than done. Let's say that your immutable object exposes a property of the type List<string>. Is that list allowed to change? And if not, how will you control it?
Eric Lippert has written a series of posts in his blog about immutability in C# that you might find interesting: you find the first part here.
One thing that I think might be missed in all these answers is that I think that an object can be considered immutable even if its internal state changes - as long as those internal changes are not visible to the 'client' code.
For example, the System.String class is immutable, but I think it would be permitted to cache the hash code for an instance so the hash is only calculated on the first call to GetHashCode(). Note that as far as I know, the System.String class does not do this, but I think it could and still be considered immutable. Of course any of these changes would have to be handled in a thread-safe manner (in keeping with the non-observable aspect of the changes).
To be honest though, I can't think of many reasons one might want or need this type of 'invisible mutability'.
Here is the definition of immutability from Wikipedia (link)
"In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created."
Essentially, once the object is created, none of its properties can be changed. An example is the String class. Once a String object is created it cannot be changed. Any operation done to it actually creates a new String object.
Lots of questions there. I'll try to answer each of them individually:
"I am wondering how immutability is defined?" - Straight from the Wikipedia page (and a perfectly accurate/concise definition)
An immutable object is an object whose state cannot be modified after it is created
"If the values aren't exposed as public, so can't be modified, then it's enough?" - Not quite. It can't be modified in any way whatsoever, so you've got to insure that methods/functions don't change the state of the object, and if performing operations, always return a new instance.
"Can the values be modified inside the type, not by the customer of the type?" - Technically, it can't be modified either inside or by a consumer of the type. In pratice, types such as System.String (a reference type for the matter) exist that can be considered mutable for almost all practical purposes, though not in theory.
"Or can one only set them inside a constructor?" - Yes, in theory that's the only place where state (variables) can be set.
"If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?" - Yes, that's still perfectly fine, because it's all part of the initialisation (creation) process, and the instance isn't returned until it has finished.
"How can I guarantee that the type is 100% immutable?" - The following conditions should insure that. (Someone please point out if I'm missing one.)
Don't expose any variables. They should all be kept private (not even protected is acceptable, since derived classes can then modify state).
Don't allow any instance methods to modify state (variables). This should only be done in the constructor, while methods should create new instances using a particular constructor if they require to return a "modified" object.
All members that are exposed (as read-only) or objects returned by methods must themselves be immutable.
Note: you can't insure the immutability of derived types, since they can define new variables. This is a reason for marking any type you wan't to make sure it immutable as sealed so that no derived class can be considered to be of your base immutable type anywhere in code.
Hope that helps.
I've learned that immutability is when you set everything in the constructor and cannot modify it later on during the lifetime of the object.
The definition of immutability can be located on Google .
Example:
immutable - literally, not able to change.
www.filosofia.net/materiales/rec/glosaen.htm
In terms of immutable data structures, the typical definition is write-once-read-many, in other words, as you say, once created, it cannot be changed.
There are some cases which are slightly in the gray area. For instance, .NET strings are considered immutable, because they can't change, however, StringBuilder internally modifies a String object.
An immutable is essentially a class that forces itself to be final from within its own code. Once it is there, nothing can be changed. In my knowledge, things are set in the constructor and then that's it. I don't see how something could be immutable otherwise.
There's unfortunately no immutable keywords in c#/vb.net, though it has been debated, but if there's no autoproperties and all fields are declared with the readonly (readonly fields can only bet assigned in the constructor) modfier and that all fields is declared of an immutable type you will have assured your self immutability.
An immutable object is one whose observable state can never be changed by any plausible sequence of code execution. An immutable type is one which guarantees that any instances exposed to the outside world will be immutable (this requirement is often stated as requiring that the object's state may only be set in its constructor; this isn't strictly necessary in the case of objects with private constructors, nor is it sufficient in the case of objects which call outside methods on themselves during construction).
A point which other answers have neglected, however, is a definition of an object's state. If Foo is a class, the state of a List<Foo> consists of the sequence of object identities contained therein. If the only reference to a particular List<Foo> instance is held by code which will neither cause that sequence to be changed, nor expose it to code that might do so, then that instance will be immutable, regardless of whether the Foo objects referred to therein are mutable or immutable.
To use an analogy, if one has a list of automobile VINs (Vehicle Identification Numbers) printed on tamper-evident paper, the list itself would be immutable even though cars aren't. Even if the list contains ten red cars today, it might contain ten blue cars tomorrow; they would still, however, be the same ten cars.
In a recent project I was working I created a structure in my class to solve a problem I was having, as a colleague was looking over my shoulder he looked derisively at the structure and said "move it into a class".
I didn't have any argument for not moving it into a class other than I only need it in this class but this kind of falls down because couldn't I make it a nested class?
When is it ok to use a structure?
You should check out the value type usage guidelines: http://msdn.microsoft.com/en-us/library/y23b5415(vs.71).aspx
The article lists several important points but the few that I feel are the most valuable are the following
Is the value immutable?
Do you want the type to have value semantics?
If the answer to both questions is yes then you almost certainly want to use a Structure. Otherwise I would advise going with a class.
There are issues with using structures with a large amount of members. But I find that if I consider the two points above, rarely do I have more than the recommended number of members / size in my value types.
MSDN has a good guidelines document to cover structure usage. To summarize:
Act like primitive types.
Have an instance size under 16 bytes.
Are immutable.
Value semantics are desirable.
Otherwise, use a class.
You should always use a Class as your first choice, changing to Structure only for very specific reasons (as others have already outlined).
Depending on how much you "only need it in this class", you might be able to avoid the nested type completely by using an anonymous type; this will only work within a single method:
Public Class Foo
Public Sub Bar
Dim baz = New With { .Str = "String", .I = 314 }
End Sub
End Class
you can't (readily--there are a few things you can do with generics) move the instance baz outside of the Sub in a typesafe manner. Of course an Object can hold anything, even an instance of an anonymous type.
I think structures are great if you need copy the object or do not want it to be modified by the passed function. Since passed functions can not modify the originally passed structure instead got a new copy of it, this can be a life saver. (unless they passed as ByRef obviously) and can save you trouble of deep copy craziness in .NET or implementing pain of an ICloneSomething implementation.
But the general idea is defining a custom data structure in a more semantic way.
About moving to a class, if you are moving into a class where it'll be part of a class, generally this is good practice since your structure is 99% of the time related with one of you classes not related with a namespace.
If you are converting it to a class then you need to consider "is it defining a data strcuture" and "is it expensive?" since it's gonna be copied all over the place, "do you want to get affected by modifications done by the passers?"
The usage guidelines referenced by Marc and Rex are excellent and nicely cover cases where you aren't sure which one you would want. I will list some use cases where use of a struct is a requirement.
When you need to set the layout of the fields in memory
Interop with unmanaged code.
When you want to make Unions.
You need a fixed size buffer inlined.
You want to be able to do the equivalent of a reinterpret_cast with relative safety (so long as the struct does not contain any fields which are themselves reference types.
These are normally edge cases and (with the exception of interop) not recommended practices unless their use is necessary for the success of the project/program.
I am implementing a class to compare directory trees (in C#). At first I implemented the actual comparison in the class's constructor. Like this:
DirectoryComparer c = new DirectoryComparer("C:\\Dir1", "C:\\Dir2");
But it doesn't feel "right" to do a possible lengthy operation in the constructor. An alternative way is to make the constructor private and add a static method like this:
DirectoryComparer c = DirectoryComparer.Compare("C:\\Dir1", "C:\\Dir2");
What do you think? Do you expect a constructor to be "quick"? Is the second example better or is it just complicating the usage of the class?
BTW:
I wont mark any answer as accepted because I don't think there is a correct answer, just preference and taste.
Edit:
Just to clarify my example a little. I'm not only insterested if the directories differs, I'm also interested in how they differ (which files). So a simple int return value wont be enough. The answer by cdragon76.myopenid.com actually is pretty close to what I want (+1 to you).
I would think a combination of the two is the "right" choice, as I would expect the Compare method to return the comparison result, not the comparer itself.
DirectoryComparer c = new DirectoryComparer();
int equality = c.Compare("C:\\Dir1", "C:\\Dir2");
...and as Dana mentions, there is an IComparer interface in .Net that reflects this pattern.
The IComparer.Compare method returns an int since the use of IComparer classes is primarily with sorting. The general pattern though fits the problem of the question in that:
Constructor initializes an instance with (optionally) "configuring" parameters
Compare method takes two "data" parameters, compares them and returns a "result"
Now, the result can be an int, a bool, a collection of diffs. Whatever fits the need.
I prefer the second one.
I expect the constructor to instanciate the class.
The method compare does what it is designed to do.
I think an interface might be what you're after. I would create a class to represent a directory, and have that implement the DirectoryComparer interface. That interface would include the compare method. If C# already has a Comparable interface, you could also just implement that.
In code, your call would be:
D1 = new Directory("C:\");
..
D1.compare(D2);
You should never do anything that might fail in a constructor. You don't want to ever create invalid objects. While you could implement a "zombie" state where the object doesn't do much, it's much better to perform any complex logic in seperate methods.
I agree with the general sentiment of not doing lengthy operations inside constructors.
Additionally, while on the subject of design, I'd consider changing your 2nd example so that the DirectoryComparer.Compare method returns something other than a DirectoryComparer object. (Perhaps a new class called DirectoryDifferences or DirectoryComparisonResult.) An object of type DirectoryComparer sounds like an object you would use to compare directories as opposed to an object that represents the differences between a pair of directories.
Then if you want to define different ways of comparing directories (such as ignoring timestamps, readonly attributes, empty directories, etc.) you could make those parameters you pass to the DirectoryComparer class constructor. Or, if you always want DirectoryComparer to have the exact same rules for comparing directories, you could simply make DirectoryComparer a static class.
For example:
DirectoryComparer comparer = new DirectoryComparer(
DirectoryComparerOptions.IgnoreDirectoryAttributes
);
DirectoryComparerResult result = comparer.Compare("C:\\Dir1", "C:\\Dir2");
Yes, typically a constructor is something quick, it is designed to prepare the object for use, not to actually do operations. I like your second option as it keeps it a one line operation.
You could also make it a bit easier by allowing the constructor to pass the two paths, then have a Compare() method that actually does the processing.
I like the second example because it explains what is exactly happening when you instantiate the object. Plus, I always use the constructor to initialize all of the global settings fro the class.
I think for a general purpose comparer you may on construction only want to specify the files you are comparing and then compare later- this way you can also implement extended logic:
Compare again- what if the directories changed?
Change the files you are comparing by updating the members.
Also, you may want to consider in your implementation receiving messages from your OS when files have been changed in the target directories- and optionally recomparing again.
The point is- you are imposing limits by assuming that this class will only be used to compare once for a single instance of those files.
Therefore, I prefer:
DirectoryComparer = new DirectoryComparer(&Dir1,&Dir2);
DirectoryComparer->Compare();
Or
DirectoryComparer = new DirectoryComparer();
DirectoryComparer->Compare(&Dir1,&Dir2);
I think it's not only okay for a constructor to take as much time as needed to construct a valid object, but the constructor is required to do so. Deferring object creation is very bad as you end up with potentially invalid objects. So, you will have to check an object everytime before you touch it (this is how it is done in the MFC, you have bool IsValid() methods everywhere).
I only see a slight difference in the two ways of creating the object. One can see the new operator as a static function of the class anyway. So, this all boils down to syntactic sugar.
What does the DirectoryComparer class do? What is it's responsibility? From my point of view (which is a C++ programmer's view) it looks like you'd be better off with just using a free function, but I don't think that you can have free functions in C#, can you? I guess you will collect the files which are different in the DirectoryComparer object. If so, you could better create something like an array of files or an equivalent class that's named accordingly.
If you are working with C#, you could use extension methods to create a method for comparing 2 directories that you would attach to the build in DirectoryClass, so it would look some thing like:
Directory dir1 = new Directory("C:\.....");
Directory dir2 = new Directory("D:\.....");
DirectoryCompare c = dir1.CompareTo(dir2);
This would be much clearer implementation.
More on extension methods here.
If an operation may take an unknown amount of time, it is an operation you might want to export into a different thread (so your main thread won't block and can do other things, like showing a spinning progress indicator for example). Other apps may not want to do this, they may want everything within a single thread (e.g. those that have no UI). Moving object creation to a separate thread is a bit awkward IMHO. I'd prefer to create the object (quickly) in my current thread and then just let a method of it run within another thread and once the method finished running, the other thread can die and I can grab the result of this method in my current thread by using another method of the object before dumping the object, since I'm happy as soon as I know the result (or keeping a copy if the result involves more details I may have to consume one at a time).
If the arguments are just going to be processed once then I don't think they belong as either constructor arguments or instance state.
If however the comparison service is going to support some kind of suspendable algorithm or you want to notify listeners when the equality state of two directories changes based on filesystem events or something like that. Then ther directories is part of the instance state.
In neither case is the constructor doing any work other than initializing an instance. In case two above the algorithm is either driven by a client, just like an Iterator for example, or it's driven by the event listening thread.
I generally try to do things like this:
Don't hold state in the instance if it can be passed as arguments to service methods.
Try to design the object with immutable state.
Defining attributes, like those used in equals and hashcode should allways be immutable.
Conceptualy a constructor is a function mapping an object representation to the object it represents.
By the definition above Integer.valueOf(1) is actually more of a constructor than new Integer(1) because Integer.valueOf(1) == Integer.valueOf(1).
,
In either case this concept also means that all the cosntructor arguments, and only the constructor argument, should define the equals behavior of an object.
I would definitely do the second.
Long actions in a constructor are fine if they are actually building the object so it is usable.
Now one thing that I see people do in constructors is call virtual methods. This is BAD since once someone uses you as a base class and overrides one of those functions you will call the base class's version not the derived class once you get into your constructor.
I don't think that talking about abstract terms like "lengthy" have anything to do with the decision if you put something in an constructor or not.
A constructor is something that should be used to initialize an object, a method should be used to "do something", i.e. have a function.