After reading this blog entry : http://wekeroad.com/post/4069048840/when-should-a-method-be-a-property,
I'm wondering why Microsoft choose in C# :
DateTime aDt = DateTime.Now;
instead of
DateTime aDt = DateTime.Now();
Best practices say : Use a method when calling the member twice in succession produces different results
And DateTime.Now is perfect example of non-determistic method/property.
Do you know if there any reason for that design ?
Or if it's just a small mistake ?
I believe in CLR via C#, Jeffrey Richter mentions that DateTime.Now is a mistake.
The System.DateTime class has a readonly
Now property that returns the current date and time. Each time you query this
property, it will return a different value. This is a mistake, and Microsoft wishes that
they could fix the class by making Now a method instead of a property.
CLR via C# 3rd Edition - Page 243
It actually is deterministic; it's output is not random, but is based on something quite predictable.
The 'current time' changes all the time; so to be relatively "the same" with each call, that value must change so that every time it's called, it's returning the current time.
EDIT:
This just occurred to me: Of course, two subsequent calls to a property getter can return different results, if something changed the property value in the interim. Properties are not supposed to be Constants.
So, that's what happening (conceptually) with DateTime.Now; its value is being changed between subsequent calls to it.
According to MSDN you should use a property when something is a logical data member of the object:
http://msdn.microsoft.com/en-us/library/bzwdh01d%28VS.71%29.aspx#cpconpropertyusageguidelinesanchor1
The go on to list out the cases where a method would be more appropriate. What is ironic is that one of the rules for a method is to use it when successive calls may return different results and of course Now certainly meets that criteria.
Personally I think this was done to eliminate the needs for the extra (), but I have found the absence of () confusing; it took me a little while to shift from the old approach in VB/VBA.
Guidelines are just that, not hard and fast rules.
Those guidelines are intended for stateful objects, and in reality are trying to say that properties should not mutate an object. DateTime.Now is a static property, so calling it does not mutate an object. It's also merely reflecting the natural state of time, not changing anything. It is simply observing a constantly changing timer.
So the point is, don't create properties that change the state of the object. Do create properties that merely observe the state of the object (even if the state changes externally).
As another example, let's look at the length of a string. This is a property, but the length of the string can change from invocation to invocation if something else changes the string externally. That's basically what is going on, the timer is being changed externally, Now just reflects its current state just as string.Length or any other such property.
In deciding "method versus property", a suggested test is "will successive calls return different results". I would suggest that a better test is the similar, but not identical question, "will calling the routine affect the outcome of future calls to the same or different routines?" In most cases, the answers to both the questions will be the same, since by far the most common reason that later calls to a routine will yield different results from the former one would be that the former one caused the later call to return a different result than it otherwise would have.
In the case of DateTime.Now, the only way in which one call would affect the value returned by another would be if the execution time taken by the first call caused the second call to occur measurably later than it otherwise would have. While a pedant might consider the passage of time to be a state-altering side effect of the first call, I would suggest that there are many properties which take longer to execute than DateTime.Now, and thus a call to any of those would have a greater likelihood of changing the value returned by a subsequent DateTime.Now call.
Note that if the "get time" routine were a virtual class member rather than being a static member, that would shift the balance in favor of making it a method; while the "expected" implementation would not affect the state of any object, it would be likely--or at least plausible--that some implementations might have side-effects. For example, calling Now on a RemoteTimeServer object might attempt to get the time from a remote server, and such attempt might have considerable side-effects on the rest of the system (e.g. by causing one or more machines to cache DNS/IP routing information, such that the next attempt to access the same server will complete 100ms faster).
Since there are no brightline rules on when to use a method and a property, DateTime.Now is truly just reading an exposed property of the state of the server, it may be constantly changing, but DateTime.Now never effects the state of any property, object or what not, so it is a property in the Framework.
Related
When I first began as a junior C# dev, I was always told during code reviews that if I was accessing an object's property more than once in a given scope then I should create a local variable within the routine as it was cheaper than having to retrieve it from the object. I never really questioned it as it came from more people I perceived to be quite knowledgeable at the time.
Below is a rudimentary example
Example 1: storing an objects identifer in a local variable
public void DoWork(MyDataType object)
{
long id = object.Id;
if (ObjectLookup.TryAdd(id, object))
{
DoSomeOtherWork(id);
}
}
Example 2: retrieving the identifier from the Id property of the object property anytime it is needed
public void DoWork(MyDataType object)
{
if (ObjectLookup.TryAdd(object.Id, object))
{
DoSomeOtherWork(object.Id);
}
}
Does it actually matter or was it more a preference of coding style where I was working? Or perhaps a situational design time choice for the developer to make?
As explained in this answer, if the property is a basic getter/setter than the CLR "will inline the property access and generate code that’s as efficient as accessing a field directly". However, if your property, for example, does some calculations every time the property is accessed, then storing the value of the property in a local variable will avoid the overhead of additional calculations being done.
All the memory allocation stuff aside, there is the principle of DRY(don't repeat yourself). When you can deal with one variable with a short name rather than repeating the object nesting to access the external property, why not do that?
Apart from that, by creating that local variable you are respecting the single responsibility principle by isolating the methods from the external entity they don't need to know about.
And lastly if the so-called resuing leads to unwanted instantiation of reference types or any repetitive calculation, then it is a must to create the local var and reuse it throughout the class/method.
Any way you look at it, this practice helps with readability and more maintainable code, and possibly safer too.
I don't know if it is faster or not (though I would say that the difference is negligible and thus unimportant), but I'll cook up some benchmark for you.
What IS important though will be made evident to you with an example;
public Class MyDataType
{
publig int id {
get {
// Some actual code
return this.GetHashCode() * 2;
}
}
}
Does this make more sense? The first time I will access the id Getter, some code will be executed. The second time, the same code will be executed costing twice as much with no need.
It is very probable, that the reviewers had some such case in mind and instead of going into every single one property and check what you are doing and if it is safe to access, they created a new rule.
Another reason to store, would be useability.
Imagine the following example
object.subObject.someOtherSubObject.id
In this case I ask in reviews to store to a variable even if they use it just once. That is because if this is used in a complicated if statement, it will reduce the readability and maintainability of the code in the future.
A local variable is essentially guaranteed to be fast, whereas there is an unknown amount of overhead involved in accessing the property.
It's almost always a good idea to avoid repeating code whenever possible. Storing the value once means that there is only one thing to change if it needs changing, rather than two or more.
Using a variable allows you to provide a name, which gives you an opportunity to describe your intent.
I would also point out that if you're referring to other members of an object a lot in one place, that can often be a strong indication that the code you're writing actually belongs in that other type instead.
You should consider that getting a value from a method that is calculated from an I/O-bound or CPU-bound process can be irrational. Therefore, it's better to define a var and store the result to avoid multiple same processing.
In the case that you are using a value like object.Id, utilizing a variable decorated with const keyword guarantees that the value will not change in the scope.
Finally, it's better to use a local var in the classes and methods.
This is probably a matter of personal preference, but when do you use properties instead of functions in your code
For instance to get an error log I could say
string GetErrorLog()
{
return m_ErrorLog;
}
or I could
string ErrorLog
{
get { return m_ErrorLog; }
}
How do you decide which one to use? I seem to be inconsistent in my usage and I'm looking for a good general rule of thumb. Thanks.
I tend to use properties if the following are true:
The property will return a single, logic value
Little or no logic is involved (typically just return a value, or do a small check/return value)
I tend to use methods if the following are true:
There is going to be significant work involved in returning the value - ie: it'll get fetched from a DB, or something that may take "time"
There is quite a bit of logic involved, either in getting or setting the value
In addition, I'd recommend looking at Microsoft's Design Guidelines for Property Usage. They suggest:
Use a property when the member is a logical data member.
Use a method when:
The operation is a conversion, such as Object.ToString.
The operation is expensive enough that you want to communicate to the user that they should consider caching the result.
Obtaining a property value using the get accessor would have an observable side effect.
Calling the member twice in succession produces different results.
The order of execution is important. Note that a type's properties should be able to be set and retrieved in any order.
The member is static but returns a value that can be changed.
The member returns an array. Properties that return arrays can be very misleading. Usually it is necessary to return a copy of the internal array so that the user cannot change internal state. This, coupled with the fact that a user can easily assume it is an indexed property, leads to inefficient code. In the following code example, each call to the Methods property creates a copy of the array. As a result, 2n+1 copies of the array will be created in the following loop.
Here are Microsoft's guidelines:
Choosing Between Properties and Methods
Consider using a property if the member represents a logical attribute of the type.
Do use a property, rather than a method, if the value of the property is stored in the process memory and the property would just provide access to the value.
Do use a method, rather than a property, in the following situations.
The operation is orders of magnitude slower than a field set would be. If you are even considering providing an asynchronous version of an operation to avoid blocking the thread, it is very likely that the operation is too expensive to be a property. In particular, operations that access the network or the file system (other than once for initialization) should most likely be methods, not properties.
The operation is a conversion, such as the Object.ToString method.
The operation returns a different result each time it is called, even if the parameters do not change. For example, the NewGuid method returns a different value each time it is called.
The operation has a significant and observable side effect. Note that populating an internal cache is not generally considered an observable side effect.
The operation returns a copy of an internal state (this does not include copies of value type objects returned on the stack).
The operation returns an array.
I use properties when its clear the semantic is "Get somevalue from the object". However using a method is a good way to communicate "this may take a bit more than a trivial effort to return".
For example a collection could have a Count property. Its reasonable to assume a collection object knows how many items are currently held without it actually having to loop through them and count them.
On the hand this hypothetical collection could have GetSum() method which returns the total of the set of items held. The collection just a easily have a Sum property instead but by using a method it communicates the idea that the collection will have to do some real work to get an answer.
I'd never use a property if I could be affecting more than one field - I'd always use a method.
Generally, I just use the
public string ErrorLog { get; private set; }
syntax for Properties and use Methods for everything else.
In addition to Reed's answer when the property is only going to be a getter like getting a resource such as an Event Log might be. I try and only use properties when the property will be side effect free.
If there is more than something trivial happening in a property, then it should be a method. For example, if your ErrorLog getter property was actually going and reading files, then it should be a method. Accessing a property should be fast, and if it is doing much processing, it should be a method. If there are side affects of accessing a property that the user of the class might not expect, then it should probably be a method.
There is .NET Framework Design Guidelines book that covers this kind of stuff in great detail.
I got a method which accepts a collection as below
public IList<CountryDto> ApplyDefaults(IList<CountryDto> dtos)
{
//Iterates the collection
//Validates the items in collection
//If items are invalid
//Removes items e.g dtos.Remove(currentCountryDto)
return dtos;//Do I need to do this?
}
My question is since, the reference to the collection is not changed, should I return the collection again from the method?
For: By returning the collection back, I make it explicit in the signature and user is aware that the items in the collection could be different from the original source. Sort of it avoid ambiguity.
Against: Since the validation doesnt change the reference of the collection, it doesn't make sense technically to return it.
What is the best approach in this case?
Note: I am not sure if this question is opinion based. I think probably I missing something here on design side.
In every programming language consistency of your own code / library with the approach of the core libraries is of high value. Hence, inspecting how Collections.sort() or Collection.swap() and Collections.shuffle() are defined, I would suggest to not return the input parameter, if you intend to modify it. In addition, your method should be named in such a way, that it is obvious the input parameter gets modified. Otherwise your method will be considered to have side-effects.
Returning a value most often suggests that it is a new instance which reflects the work, performed by the method or is used for method-chaining in case of builders.
Given your comments/requirements:
Does not need to report if defaults are applied.
ApplyDefaults is complicated and invoking other services and not intended to produce a fluent API
ApplyDefaults is a "black box"; validation logic is injected so the calling code doesn't know/care about the validation
I think based on these, this method definitely should not return the reference to the incoming list, even if no validation is applied. Firstly, unless the API is clearly built around method chaining (which you indicated you do not want), returning a List<T> type usually indicates a new List is being created. Secondly, if a new list is not created, users may find themselves modifying the list in ways they didn't expect.
Consider:
IList<CountryDto> originalCountries = Service.GetCountries();
IList<CountryDto> validatedCountries = ApplyDefaults(originalCountries);
validatedCountries.Add(mySpecialCountry);
OutputOriginalCountries(originalCountries);
OutputValidatedCountries(validatedCountries);
This code isn't very special, and a fairly common pattern. If ApplyDefaults returned a reference to the same originalCountries collection, then mySpecialCountry would also be added to originalCountries. This would violate the Principle of Least Astonishment.
This would be exacerbated if this behaviour changed depending on whether or not items were validated/filtered. Since the validation logic is a black-box of behaviour that the caller doesn't know or care about, the API consumer could not depend on whether or not it returned the same reference. They would either have to do their own reference check (e.g., if (myValidatedCountries == myInputCountries)), or simply make a copy every time. Regardless, this becomes another weird behaviour that the programmer has to juggle when working with the API.
I think that the method should either:
A) always return a copied list with the items filtered out (public IList<CountryDto> ApplyDefaults(IEnumerable<CountryDto> dtos))
B) modify the incoming list in-place (public void ApplyDefaults(IList<CountryDto> dtos))
For option A, depending on the size of your list, this incurs the possible unnecessary work of creating a copied list every time even if no filtering is performed. However, the validation/filtering logic might be simpler. You might be able to use LINQ queries to apply the filtering nicely. Additionally, removing items from a list is generally costly as it has to rebuild the internal array. So it might actually be faster to build a new list. You may even simplify the signature here to be IEnumerable<CountryDto>; this allows for wider usage and is extremely obvious that you're creating a new collection.
For option B, if no validation is required, then no work is done and the method is essentially "free" (no array rebuilding, no copying, no reference changes). But if there is significant validation, the removal aspect may be costly. Since you're not method chaining, this version should have a void return type as it's much more obvious to the developer that this is modifying the list in-place. This follows other commonly known methods like List<T>.Sort. Furthermore, if a user wants to have a separate originalCountries and validatedCountries they can always make a copy:
var validatedCountries = originalCountries.ToList();
ApplyDefaults(validatedCountries);
Ultimately, which one you choose might depend on performance. If validation/removal is cheap and rare, then modifying the list in-place might be best. If you're expecting a lot of changes to the list, it might simply be faster to produce a new copy every time.
Regardless, I would suggest you name the method with a little more clarity as well. For example:
public IList<CountryDto> GetValidCountries(IEnumerable<CountryDto> dtos)
public void RemoveInvalidCountries(IList<CountryDto> dtos)
Of course, the naming might be different depending on your actual code context (I suspect ApplyDefaults is a common/inherited method name and not specific to CountryDto)
I'd rather return boolean (or enum in an elaborated case: collection preserved intact,
changed, can't be validated etc.)
// true if the collection is changed, false otherwise
public Boolean ApplyDefaults(IList<CountryDto> dtos) {
Boolean result = false;
//Iterates the collection
//Validates the items in collection
//If items are invalid:
// Removes items e.g dtos.Remove(currentCountryDto)
// result = true;
...
return result;
}
...
if (ApplyDefaults(myData)) {
// Collection is changed, do some extra stuff
}
First of all: you cannot change the reference of the collection you send by parameter, because by default you're getting copy of it. You'd need to use a ref keyword in order to be able to change it.
Secondly: if your method has a return type, than it has to return an object. Your method is not called GetNewCollectionWithAppliedDefaults, but ApplyDefaults which implies that the collection will be modified. You should either return boolean true/false to inform user changes were done or always return parameter's collecion (to allow nested methods calling).
Also, why would you think it doesn't make sense to return a collection? I'd say there's no argument against it. Turn the question around: "why wouldn't I return the collection and could it harm my code"?
Technically, I would say there is not much difference between the two.
However, and as you pointed out, a common used convention is that a function should only return an object it creates. Basically, that would mean that a function that returns an object is generating one while a function which doesn't return anything is modifying the object passed as a parameter.
Again, this is only a convention and it is not widely used within the C# community, but in the python community for example, it is.
Some people, returns a Boolean (or an error code) instead as an indicator of an error (like the old dos command line). I don't like this approach and prefer by far raising exceptions that I can handle later on.
Finally, the best approach in my regard, is to return a value that indicates if a change was done by the function and eventually a value indicating how much of a change was done. It can be a Boolean or it can be the number of inserted/removed elements...
In any case, try to be consistent with the approach you chose, if not in all your code, at least within a single project. Sometimes, you will have no other choice but to abide with the convention used by your teammates.
(My answer is based on the Java viewpoint; C++ and C# programmers might have a different take.) I think it's best to return the collection. The fact that the collection you're returning is the same collection that was given is just an implementation detail, and in future versions of the code, you might want to change that. Document that the collection returned might not be the same one passed in.
If, on the other hand, you want to lock in the design that this method modifies a collection in place, document it that way and don't return the collection. I prefer not to do it this way, but I can see advantages in some contexts.
In your case I would leave void since ApplyDefaults clearly states what its doing.
Also, it might be a good idea to ApplyDefaults in the collection itself. Subclass IList or List or whatever and then you'd call like this:
myCollection.ApplyDefaults();
Which is just obvious.
I tend to assume that getters are little more than an access control wrapper around an otherwise fairly lightweight set of instructions to return a value (or set of values).
As a result, when I find myself writing longer and more CPU-hungry setters, I feel Perhaps this is not the smartest move. In calling a getter in my own code (in particular let's refer to C# where there is a syntactical difference between method vs. getter calls), then I make an implicit assumption that these are lightweight -- when in fact that may well not be the case.
What's the general consensus on this? Use of other people's libraries aside, do you write heavy getters? Or do you tend to treat heavier getters as "full methods"?
PS. Due to language differences, I expect there'll be quite a number of different thoughts on this...
Property getters are intended to retrieve a value. So when developers call them, there is an expectation that the call will return (almost) immediately with a value. If that expectation cannot be met, it is better to use a method instead of a property.
From MSDN:
Property Usage Guidelines
Use a method when:
[...]
The operation is expensive enough that you want to communicate to the
user that they should consider caching
the result.the result.
And also:
Choosing Between Properties and Methods
Do use a method, rather than a
property, in the following situations.
The operation is orders of magnitude slower than a field set would be. If
you are even considering providing an
asynchronous version of an operation
to avoid blocking the thread, it is
very likely that the operation is too
expensive to be a property. In
particular, operations that access the
network or the file system (other than
once for initialization) should most
likely be methods, not properties.
True. Getters should either access a simple member, or should compute and cache a derived value and then return the cached value (subsequent gets without interleaved sets should merely return that value). If I have a function that is going to do a lot of computation, then I name it computeX, not getX.
All in all, very few of my methods are so expensive in terms of time that it would matter based on the guidelines as posted by Thomas. But the thing is that generally calls to a getter should not affect that state of the class. I have no problem writing a getter that actually runs a calculation when called though.
In general, I write short, efficient ones. But you might have complex ones -- you need to consider how the getter will be used. And if it is an external API, you don't have any control how it is used - so shoot for efficiency.
I would agree with this. It is useful to have calculated properties for example for things like Age based on DateOfBirth. But I would avoid complex logic like having to go to a database just to calculate the value of an object's property. Use method in that case.
My opinion is that getter should be lightweight, but again as you say there is a broad definition of "lightweight", adding a logger is fine for tracing purpose, and probably some cache logic too and database/web service retrieval .. ouch. your getter is already considered heavy.
Getter are syntaxic sugar like setters, I consider that method are more flexible because of the simplicity of using them asynchronously.
But there is no expectation set for your getter performance (maybe try to mention it in the cough documentation ), as it could be trying to retrieve fresh values from slow source.
Others are certainly considering getter for simple objects, but as your object could be a proxy for your backend object, I really see not point too set performance expectations as it helps you makes the code more readable and more maintainable.
So my answer would be, "it depends", mainly on the level of abstraction of your object ( short logic for low level object as the value should probably be calculated on the setter level, long ones for hight level ).
I would like to get your opinion on as how far to go with side-effect-free setters.
Consider the following example:
Activity activity;
activity.Start = "2010-01-01";
activity.Duration = "10 days"; // sets Finish property to "2010-01-10"
Note that values for date and duration are shown only for indicative purposes.
So using setter for any of the properties Start, Finish and Duration will consequently change other properties and thus cannot be considered side-effect-free.
Same applies for instances of the Rectangle class, where setter for X is changing the values of Top and Bottom and so on.
The question is where would you draw a line between using setters, which have side-effects of changing values of logically related properties, and using methods, which couldn't be much more descriptive anyway. For example, defining a method called SetDurationTo(Duration duration) also doesn't reflect that either Start or Finish will be changed.
I think you're misunderstanding the term "side-effect" as it applies to program design. Setting a property is a side effect, no matter how much or how little internal state it changes, as long as it changes some sort of state. A "side-effect-free setter" would not be very useful.
Side-effects are something you want to avoid on property getters. Reading the value of a property is something that the caller does not expect to change any state (i.e. cause side-effects), so if it does, it's usually wrong or at least questionable (there are exceptions, such as lazy loading). But getters and setters alike are just wrappers for methods anyway. The Duration property, as far as the CLR is concerned, is just syntactic sugar for a set_Duration method.
This is exactly what abstractions such as classes are meant for - providing coarse-grained operations while keeping a consistent internal state. If you deliberately try to avoid having multiple side-effects in a single property assignment then your classes end up being not much more than dumb data containers.
So, answering the question directly: Where do I draw the line? Nowhere, as long as the method/property actually does what its name implies. If setting the Duration also changed the ActivityName, that might be a problem. If it changes the Finish property, that ought to be obvious; it should be impossible to change the Duration and have both the Start and Finish stay the same. The basic premise of OOP is that objects are intelligent enough to manage these operations by themselves.
If this bothers you at a conceptual level then don't have mutator properties at all - use an immutable data structure with read-only properties where all of the necessary arguments are supplied in the constructor. Then have two overloads, one that takes a Start/Duration and another that takes a Start/Finish. Or make only one of the properties writable - let's say Finish to keep it consistent with Start - and then make Duration read-only. Use the appropriate combination of mutable and immutable properties to ensure that there is only one way to change a certain state.
Otherwise, don't worry so much about this. Properties (and methods) shouldn't have unintended or undocumented side effects, but that's about the only guideline I would use.
Personally, I think it makes sense to have a side-effect to maintain a consistent state. Like you said, it makes sense to change logically-related values. In a sense, the side-effect is expected. But the important thing is to make that point clear. That is, it should be evident that the task the method is performing has some sort of side-effect. So instead of SetDurationTo you could call your function ChangeDurationTo, which implies something else is going on. You could also do this another way by having a function/method that adjusts the duration AdjustDurationTo and pass in a delta value. It would help if you document the function as having a side-effect.
I think another way to look at it is to see if a side-effect is expected. In your example of a Rectangle, I would expect it to change the values of top or bottom to maintain an internally-consistent state. I don't know if this is subjective; it just seems to make sense to me. As always, I think documentation wins out. If there is a side-effect, document it really well. Preferably by the name of the method and through supporting documentation.
One option is to make your class immutable and have methods create and return new instances of the class which have all appropriate values changed. Then there are no side effects or setters. Think of something like DateTime where you can call things like AddDays and AddHours which will return a new DateTime instance with the change applied.
I have always worked with the general rule of not allowing public setters on properties that are not side-effect free since callers of your public setters can't be certain of what might happen, but of course, people that modify the assembly itself should have a pretty good idea as they can see the code.
Of course, there are always times where you have to break the rule for the sake of either readability, to make your object model logical, or just to make things work right. Like you said, really a matter of preference in general.
I think it's mostly a matter of common-sense.
In this particular example, my problem is not so much that you've got properties that adjust "related" properties, it's that you've got properties taking string values that you're then internaly parsing into DateTime (or whatever) values.
I would much rather see something like this:
Activity activity;
activity.Start = DateTime.Parse("2010-01-01");
activity.Duration = Duration.Parse("10 days");
That is, you explicity note that you're doing parsing of strings. Allow the programmer to specify strong-typed objects when that is appropriate as well.