Name for this pattern? (Answer: lazy initialization with double-checked locking) - c#

Consider the following code:
public class Foo
{
private static object _lock = new object();
public void NameDoesNotMatter()
{
if( SomeDataDoesNotExist() )
{
lock(_lock)
{
if( SomeDataDoesNotExist() )
{
CreateSomeData();
}
else
{
// someone else also noticed the lack of data. We
// both contended for the lock. The other guy won
// and created the data, so we no longer need to.
// But once he got out of the lock, we got in.
// There's nothing left to do.
}
}
}
}
private bool SomeDataDoesNotExist()
{
// Note - this method must be thread-safe.
throw new NotImplementedException();
}
private bool CreateSomeData()
{
// Note - This shouldn't need to be thread-safe
throw new NotImplementedException();
}
}
First, there are some assumptions I need to state:
There is a good reason I couldn't just do this once an app startup. Maybe the data wasn't available yet, etc.
Foo may be instantiated and used concurrently from two or more threads. I want one of them to end up creating some data (but not both of them) then I'll allow both to access that same data (ignore thread safety of accessing the data)
The cost to SomeDataDoesNotExist() is not huge.
Now, this doesn't necessarily have to be confined to some data creation situation, but this was an example I could think of.
The part that I'm especially interested in identifying as a pattern is the check -> lock -> check. I've had to explain this pattern to developers on a few occasions who didn't get the algorithm at first glance but could then appreciate it.
Anyway, other people must do similarly. Is this a standardized pattern? What's it called?

Though I can see how you might think this looks like double-checked locking, what it actually looks like is dangerously broken and incorrect double-checked locking. Without an actual implementation of SomeDataDoesNotExist and CreateSomeData to critique we have no guarantee whatsoever that this thing is actually threadsafe on every processor.
For an example of an analysis of how double-checked locking can go wrong, check out this broken and incorrect version of double-checked locking:
C# manual lock/unlock
My advice: don't use any low-lock technique without a compelling reason and a code review from an expert on the memory model; you'll probably get it wrong. Most people do.
In particular, don't use double-checked locking unless you can describe exactly what memory access reorderings the processors can do on your behalf and provide a convincing argument that your solution is correct given any possible memory access reordering. The moment you step away even slightly from a known-to-be-correct implementation, you need to start the analysis over from scratch. You can't assume that just because one implementation of double-checked locking is correct, that they all are; almost none of them are correct.

Lazy initialization with double-checked locking?

The part that I'm especially interested in identifying as a pattern is the check -> lock -> check.
That is called double-checked locking.
Beware that in older Java versions (before Java 5) it is not safe because of how Java's memory model was defined. In Java 5 and newer changes were made to the specification of Java's memory model so that it is now safe.

The only name that comes to mind for this kind of is "Faulting". This name is used in iOS Core-Data framework to similar effect.
Basically, your method NameDoesNotMatter is a fault, and whenever someone invokes it, it results in the object to get populated or initialized.
See http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/Articles/cdFaultingUniquing.html for more details on how this design pattern is used.

Related

Microsoft's remark to ReaderWriterLockSlim.IsReadLockHeld/IsWriteLockHeld and its consequences

To synchronize the access to my properties I use the ReaderWriterLockSlim class. I use the following code to access my properties in a thread-safe way.
public class SomeClass
{
public readonly ReaderWriterLockSlim SyncObj = new ReaderWriterLockSlim();
public string AProperty
{
get
{
if (SyncObj.IsReadLockHeld)
return ComplexGetterMethod();
SyncObj.EnterReadLock();
try
{
return ComplexGetterMethod();
}
finally
{
SyncObj.ExitReadLock();
}
}
set
{
if (SyncObj.IsWriteLockHeld)
ComplexSetterMethod(value);
else
{
SyncObj.EnterWriteLock();
ComplexSetterMethod(value);
SyncObj.ExitWriteLock();
}
}
}
// more properties here ...
private string ComplexGetterMethod()
{
// This method is not thread-safe and reads
// multiple values, calculates stuff, ect.
}
private void ComplexSetterMethod(string newValue)
{
// This method is not thread-safe and reads
// and writes multiple values.
}
}
// =====================================
public static SomeClass AClass = new SomeClass();
public void SomeMultiThreadFunction()
{
...
// access with locking from within the setter
AClass.AProperty = "new value";
...
// locking from outside of the class to increase performance
AClass.SyncObj.EnterWriteLock();
AClass.AProperty = "new value 2";
AClass.AnotherProperty = "...";
...
AClass.SyncObj.ExitWriteLock();
...
}
To avoid unnecessary locks whenever I get or set multiple properties a once I published the ReaderWriterLockSlim-Object and lock it from outside of the class every time I'm about to get or set a bunch of properties. To achieve this my getter and setter methods check if the lock has been acquired using the IsReadLockHeld property and the IsWriteLockHeld property of ReaderWriterLockSlim. This works fine and has increased the performance of my code.
So far so good but when I re-read the documentation about IsReadLockHeld and IsWriteLockHeld I noticed the remark form Microsoft:
This property is intended for use in asserts or for other debugging
purposes. Do not use it to control the flow of program execution.
My question is: Is there a reason why I should not use IsReadLockHeld/IsWriteLockHeld for this purpose? Is there anything wrong with my code? Everything works as expected and much faster than using recursive locks (LockRecursionPolicy.SupportsRecursion).
To clarify this up: This is a minimal example. I don't want to know if the lock itself is necessary or can be removed or achieved in a different way. I just want to know why I should not use IsReadLockHeld/IsWriteLockHeld to control the flow of the programm as stated by the documentation.
After some further research I posted the same question on the German Support Forum of the Microsoft Developer Network and got into discussion with the very helpful moderator Marcel Roma. He was able to contact the programmer of the ReaderWriterLockSlim Joe Duffy who wrote this answer:
I'm afraid my answer may leave something to be desired.
The property works fine and as documented. The guidance really is just
because conditional acquisition and release of locks tends to be buggy
and error-prone in practice, particularly with exceptions thrown into
the mix.
It's typically a good idea to structure your code so that you either
use recursive acquires, or you don't, (and of course the latter is
always easier to reason about); using properties like IsReadLockHeld
lands you somewhere in the middle.
I was one of the primary designers of RWLS and I have to admit it has
way too many bells and whistles. I don't necessarily regret adding
IsReadLockHeld -- as it can come in handy for debugging and assertions
-- however as soon as we added it, Pandora's box was opened, and we RWLS was instantly opened up to this kind of usage.
I'm not surprised that people want to use it as shown in the
StackOverflow thread, and I'm sure there are some legitimate scenarios
where it works better than the alternatives. I merely advise erring on
the side of not using it.
To sum things up: You can use the IsReadLockHeld and the IsWriteLockHeld property to acquire a lock conditionally and everything will work fine, but it is bad programming style and one should avoid it. It is better to stick to recursive or non-recursive locks. To maintain a good coding style IsReadLockHeld and IsWriteLockHeld should only be used for debugging purposes.
I want to thank Marcel Roma and Joe Duffy again for their precious help.
Documentation is advising you the right thing.
Considere the following interleaved execution.
Thread1.AcqrireReadLock();
Thread1.ComplexGetterMethod();
Thread2.ReadIsReaderLockHeldProperty();
Thread1.ReleaseReadLock();
Thread2.ComplexGetterMethod(); // performing read without lock.
The other wrong thing with your code that I see is
SyncObj.EnterReadLock();
try
{
return ComplexGetterMethod();
}
finally
{
SyncObj.ExitReadLock();
}
is not the right way to do things. This is one right:
try
{
SyncObj.EnterReadLock();
return ComplexGetterMethod();
}
finally
{
if (SyncObj.IsReadLockHeld)
SyncObj.ExitReadLock();
}
And this shall be exact definition of your getter method.

Good or bad practice? Initializing objects in getter

I have a strange habit it seems... according to my co-worker at least. We've been working on a small project together. The way I wrote the classes is (simplified example):
[Serializable()]
public class Foo
{
public Foo()
{ }
private Bar _bar;
public Bar Bar
{
get
{
if (_bar == null)
_bar = new Bar();
return _bar;
}
set { _bar = value; }
}
}
So, basically, I only initialize any field when a getter is called and the field is still null. I figured this would reduce overload by not initializing any properties that aren't used anywhere.
ETA: The reason I did this is that my class has several properties that return an instance of another class, which in turn also have properties with yet more classes, and so on. Calling the constructor for the top class would subsequently call all constructors for all these classes, when they are not always all needed.
Are there any objections against this practice, other than personal preference?
UPDATE: I have considered the many differing opinions in regards to this question and I will stand by my accepted answer. However, I have now come to a much better understanding of the concept and I'm able to decide when to use it and when not.
Cons:
Thread safety issues
Not obeying a "setter" request when the value passed is null
Micro-optimizations
Exception handling should take place in a constructor
Need to check for null in class' code
Pros:
Micro-optimizations
Properties never return null
Delay or avoid loading "heavy" objects
Most of the cons are not applicable to my current library, however I would have to test to see if the "micro-optimizations" are actually optimizing anything at all.
LAST UPDATE:
Okay, I changed my answer. My original question was whether or not this is a good habit. And I'm now convinced that it's not. Maybe I will still use it in some parts of my current code, but not unconditionally and definitely not all the time. So I'm going to lose my habit and think about it before using it. Thanks everyone!
What you have here is a - naive - implementation of "lazy initialization".
Short answer:
Using lazy initialization unconditionally is not a good idea. It has its places but one has to take into consideration the impacts this solution has.
Background and explanation:
Concrete implementation:
Let's first look at your concrete sample and why I consider its implementation naive:
It violates the Principle of Least Surprise (POLS). When a value is assigned to a property, it is expected that this value is returned. In your implementation this is not the case for null:
foo.Bar = null;
Assert.Null(foo.Bar); // This will fail
It introduces quite some threading issues: Two callers of foo.Bar on different threads can potentially get two different instances of Bar and one of them will be without a connection to the Foo instance. Any changes made to that Bar instance are silently lost.
This is another case of a violation of POLS. When only the stored value of a property is accessed it is expected to be thread-safe. While you could argue that the class simply isn't thread-safe - including the getter of your property - you would have to document this properly as that's not the normal case. Furthermore the introduction of this issue is unnecessary as we will see shortly.
In general:
It's now time to look at lazy initialization in general:
Lazy initialization is usually used to delay the construction of objects that take a long time to be constructed or that take a lot of memory once fully constructed.
That is a very valid reason for using lazy initialization.
However, such properties normally don't have setters, which gets rid of the first issue pointed out above.
Furthermore, a thread-safe implementation would be used - like Lazy<T> - to avoid the second issue.
Even when considering these two points in the implementation of a lazy property, the following points are general problems of this pattern:
Construction of the object could be unsuccessful, resulting in an exception from a property getter. This is yet another violation of POLS and therefore should be avoided. Even the section on properties in the "Design Guidelines for Developing Class Libraries" explicitly states that property getters shouldn't throw exceptions:
Avoid throwing exceptions from property getters.
Property getters should be simple operations without any preconditions. If a getter might throw an exception, consider redesigning the property to be a method.
Automatic optimizations by the compiler are hurt, namely inlining and branch prediction. Please see Bill K's answer for a detailed explanation.
The conclusion of these points is the following:
For each single property that is implemented lazily, you should have considered these points.
That means, that it is a per-case decision and can't be taken as a general best practice.
This pattern has its place, but it is not a general best practice when implementing classes. It should not be used unconditionally, because of the reasons stated above.
In this section I want to discuss some of the points others have brought forward as arguments for using lazy initialization unconditionally:
Serialization:
EricJ states in one comment:
An object that may be serialized will not have it's contructor invoked when it is deserialized (depends on the serializer, but many common ones behave like this). Putting initialization code in the constructor means that you have to provide additional support for deserialization. This pattern avoids that special coding.
There are several problems with this argument:
Most objects never will be serialized. Adding some sort of support for it when it is not needed violates YAGNI.
When a class needs to support serialization there exist ways to enable it without a workaround that doesn't have anything to do with serialization at first glance.
Micro-optimization:
Your main argument is that you want to construct the objects only when someone actually accesses them. So you are actually talking about optimizing the memory usage.
I don't agree with this argument for the following reasons:
In most cases, a few more objects in memory have no impact whatsoever on anything. Modern computers have way enough memory. Without a case of actual problems confirmed by a profiler, this is pre-mature optimization and there are good reasons against it.
I acknowledge the fact that sometimes this kind of optimization is justified. But even in these cases lazy initialization doesn't seem to be the correct solution. There are two reasons speaking against it:
Lazy initialization potentially hurts performance. Maybe only marginally, but as Bill's answer showed, the impact is greater than one might think at first glance. So this approach basically trades performance versus memory.
If you have a design where it is a common use case to use only parts of the class, this hints at a problem with the design itself: The class in question most likely has more than one responsibility. The solution would be to split the class into several more focused classes.
It is a good design choice. Strongly recommended for library code or core classes.
It is called by some "lazy initialization" or "delayed initialization" and it is generally considered by all to be a good design choice.
First, if you initialize in the declaration of class level variables or constructor, then when your object is constructed, you have the overhead of creating a resource that may never be used.
Second, the resource only gets created if needed.
Third, you avoid garbage collecting an object that was not used.
Lastly, it is easier to handle initialization exceptions that may occur in the property then exceptions that occur during initialization of class level variables or the constructor.
There are exceptions to this rule.
Regarding the performance argument of the additional check for initialization in the "get" property, it is insignificant. Initializing and disposing an object is a more significant performance hit than a simple null pointer check with a jump.
Design Guidelines for Developing Class Libraries at http://msdn.microsoft.com/en-US/library/vstudio/ms229042.aspx
Regarding Lazy<T>
The generic Lazy<T> class was created exactly for what the poster wants, see Lazy Initialization at http://msdn.microsoft.com/en-us/library/dd997286(v=vs.100).aspx. If you have older versions of .NET, you have to use the code pattern illustrated in the question. This code pattern has become so common that Microsoft saw fit to include a class in the latest .NET libraries to make it easier to implement the pattern. In addition, if your implementation needs thread safety, then you have to add it.
Primitive Data Types and Simple Classes
Obvioulsy, you are not going to use lazy-initialization for primitive data type or simple class use like List<string>.
Before Commenting about Lazy
Lazy<T> was introduced in .NET 4.0, so please don't add yet another comment regarding this class.
Before Commenting about Micro-Optimizations
When you are building libraries, you must consider all optimizations. For instance, in the .NET classes you will see bit arrays used for Boolean class variables throughout the code to reduce memory consumption and memory fragmentation, just to name two "micro-optimizations".
Regarding User-Interfaces
You are not going to use lazy initialization for classes that are directly used by the user-interface. Last week I spent the better part of a day removing lazy loading of eight collections used in a view-model for combo-boxes. I have a LookupManager that handles lazy loading and caching of collections needed by any user-interface element.
"Setters"
I have never used a set-property ("setters") for any lazy loaded property. Therefore, you would never allow foo.Bar = null;. If you need to set Bar then I would create a method called SetBar(Bar value) and not use lazy-initialization
Collections
Class collection properties are always initialized when declared because they should never be null.
Complex Classes
Let me repeat this differently, you use lazy-initialization for complex classes. Which are usually, poorly designed classes.
Lastly
I never said to do this for all classes or in all cases. It is a bad habit.
Do you consider implementing such pattern using Lazy<T>?
In addition to easy creation of lazy-loaded objects, you get thread safety while the object is initialized:
http://msdn.microsoft.com/en-us/library/dd642331.aspx
As others said, you lazily-load objects if they're really resource-heavy or it takes some time to load them during object construction-time.
I think it depends on what you are initialising. I probably wouldn't do it for a list as the construction cost is quite small, so it can go in the constructor. But if it was a pre-populated list then I probably wouldn't until it was needed for the first time.
Basically, if the cost of construction outweighs the cost of doing an conditional check on each access then lazy create it. If not, do it in the constructor.
Lazy instantiation/initialization is a perfectly viable pattern. Keep in mind, though, that as a general rule consumers of your API do not expect getters and setters to take discernable time from the end user POV (or to fail).
The downside that I can see is that if you want to ask if Bars is null, it would never be, and you would be creating the list there.
I was just going to put a comment on Daniel's answer but I honestly don't think it goes far enough.
Although this is a very good pattern to use in certain situations (for instance, when the object is initialized from the database), it's a HORRIBLE habit to get into.
One of the best things about an object is that it offeres a secure, trusted environment. The very best case is if you make as many fields as possible "Final", filling them all in with the constructor. This makes your class quite bulletproof. Allowing fields to be changed through setters is a little less so, but not terrible. For instance:
class SafeClass
{
String name="";
Integer age=0;
public void setName(String newName)
{
assert(newName != null)
name=newName;
}// follow this pattern for age
...
public String toString() {
String s="Safe Class has name:"+name+" and age:"+age
}
}
With your pattern, the toString method would look like this:
if(name == null)
throw new IllegalStateException("SafeClass got into an illegal state! name is null")
if(age == null)
throw new IllegalStateException("SafeClass got into an illegal state! age is null")
public String toString() {
String s="Safe Class has name:"+name+" and age:"+age
}
Not only this, but you need null checks everywhere you might possibly use that object in your class (Outside your class is safe because of the null check in the getter, but you should be mostly using your classes members inside the class)
Also your class is perpetually in an uncertain state--for instance if you decided to make that class a hibernate class by adding a few annotations, how would you do it?
If you make any decision based on some micro-optomization without requirements and testing, it's almost certainly the wrong decision. In fact, there is a really really good chance that your pattern is actually slowing down the system even under the most ideal of circumstances because the if statement can cause a branch prediction failure on the CPU which will slow things down many many many more times than just assigning a value in the constructor unless the object you are creating is fairly complex or coming from a remote data source.
For an example of the brance prediction problem (which you are incurring repeatedly, nost just once), see the first answer to this awesome question: Why is it faster to process a sorted array than an unsorted array?
Let me just add one more point to many good points made by others...
The debugger will (by default) evaluate the properties when stepping through the code, which could potentially instantiate the Bar sooner than would normally happen by just executing the code. In other words, the mere act of debugging is changing the execution of the program.
This may or may not be a problem (depending on side-effects), but is something to be aware of.
Are you sure Foo should be instantiating anything at all?
To me it seems smelly (though not necessarily wrong) to let Foo instantiate anything at all. Unless it is Foo's express purpose to be a factory, it should not instantiate it's own collaborators, but instead get them injected in its constructor.
If however Foo's purpose of being is to create instances of type Bar, then I don't see anything wrong with doing it lazily.

Double checked locking on Dictionary "ContainsKey"

My team is currently debating this issue.
The code in question is something along the lines of
if (!myDictionary.ContainsKey(key))
{
lock (_SyncObject)
{
if (!myDictionary.ContainsKey(key))
{
myDictionary.Add(key,value);
}
}
}
Some of the posts I've seen say that this may be a big NO NO (when using TryGetValue). Yet members of our team say it is ok since "ContainsKey" does not iterate on the key collection but checks if the key is contained via the hash code in O(1). Hence they claim there is no danger here.
I would like to get your honest opinions regarding this issue.
Don't do this. It's not safe.
You could be calling ContainsKey from one thread while another thread calls Add. That's simply not supported by Dictionary<TKey, TValue>. If Add needs to reallocate buckets etc, I can imagine you could get some very strange results, or an exception. It may have been written in such a way that you don't see any nasty effects, but I wouldn't like to rely on it.
It's one thing using double-checked locking for simple reads/writes to a field, although I'd still argue against it - it's another to make calls to an API which has been explicitly described as not being safe for multiple concurrent calls.
If you're on .NET 4, ConcurrentDictionary is probably the way forward. Otherwise, just lock on every access.
If you are in a multithreaded environment, you may prefer to look at using a ConcurrentDictionary. I blogged about it a couple of months ago, you might find the article useful: http://colinmackay.co.uk/blog/2011/03/24/parallelisation-in-net-4-0-the-concurrent-dictionary/
This code is incorrect. The Dictionary<TKey, TValue> type does not support simultaneous read and write operations. Even though your Add method is called within the lock the ContainsKey is not. Hence it easily allows for a violation of the simultaneous read / write rule and will lead to corruption in your instance
It doesn't look thread-safe, but it would probably be hard to make it fail.
The iteration vs hash lookup argument doesn't hold, there could be a hash-collision for instance.
If this dictionary is rarely written and often read, then I often employ safe double locking by replacing the entire dictionary on write. This is particularly effective if you can batch writes together to make them less frequent.
For example, this is a cut down version of a method we use that tries to get a schema object associated with a type, and if it can't, then it goes ahead and creates schema objects for all the types it finds in the same assembly as the specified type to minimize the number of times the entire dictionary has to be copied:
public static Schema GetSchema(Type type)
{
if (_schemaLookup.TryGetValue(type, out Schema schema))
return schema;
lock (_syncRoot) {
if (_schemaLookup.TryGetValue(type, out schema))
return schema;
var newLookup = new Dictionary<Type, Schema>(_schemaLookup);
foreach (var t in type.Assembly.GetTypes()) {
var newSchema = new Schema(t);
newLookup.Add(t, newSchema);
}
_schemaLookup = newLookup;
return _schemaLookup[type];
}
}
So the dictionary in this case will be rebuilt, at most, as many times as there are assemblies with types that need schemas. For the rest of the application lifetime the dictionary accesses will be lock-free. The dictionary copy becomes a one-time initialization cost of the assembly. The dictionary swap is thread-safe because pointer writes are atomic so the whole reference gets switched at once.
You can apply similar principles in other situations as well.

c# vb: Should we use System.Lazy for resource-intensive task? (when threading is not needed)

I'm wondering is there some kind of JIT-hack going on with System.Lazy to make things more performant or is it purely a "normal class"?
From the page http://msdn.microsoft.com/en-us/library/dd642331.aspx it says:
Use an instance of Lazy(Of T) to defer
the creation of a large or
resource-intensive object or the
execution of a resource-intensive
task, particularly when such creation
or execution might not occur during
the lifetime of the program.
but i can defer the execution of a resource-intensive task using a simple boolean flag couldn't i? So what exactly is the difference? (other than System.Lazy has additional overheads for no apparent "syntax sugar" gains)
With a simple boolean flag its simply:
if (!deferred) {
//run resource-intensive task
}
Edit:
here's an example
class Human{
System.Lazy<String> name = new System.Lazy<String>(() =>
{
//code here takes 4 seconds to run
return "the value";
});
String Name
{
get
{
return name.Value;
}
}
}
vs
class Human
{
String name;
bool name_initiated;
String Name
{
get
{
if (!name_initiated)
{
//code here takes 4 seconds to run
name = "the value";
name_initiated = true;
}
return name;
}
}
}
6 May: now i use this alot. And i really mean alot alot. i use it whenever i need to cache data (even when the computation is 0.1 second or lesser). Hence my question, should i be worried? Now i know you will tell me to profile the app, but im building the library first before i build the app and by that time if the app has problems that would mean Major change
Yes, you could defer it with a simple Boolean flag. Of course, you'd need to handle volatility of both the flag and the result... and make sure you knew what you wanted in terms of the result if one thread asks for the result while it's still being computed. Oh, and try to avoid locking where possible. And make it all bulletproof in terms of thread safety.
Nah, no benefit at all to using a type built by experts ;)
Seriously: why do it yourself if someone else has done it for you? Why write the code to check a flag, work out how to wait safely, lock everything etc. Even if it were a relatively simple thing to get right, it's better if it only needs to be done once in a reusable fashion.
Another good example of this principle is Nullable<T>. You could easily get most of the same behaviour yourself (not boxing) or even not bother with the encapsulation at all, and just keep a flag alongside your normal field... but with the built-in type, you get all of that implemented for free, along with syntactic sugar etc.
The Lazy class makes the process easier. It is similar to using a String instead of a character array. Not technically necessary, but can be useful.
Lazy<T> is just an encapsulation of the best way to implement a lazy singleton. If you want thread-safety, there's more to it than just if(!initialized) instance = Initialize();. I generally assume the BCL team will be better at implementing than me.
Update: Based on your sample, I would say the advantage of Lazy<> is simply less code to maintain. Beyond that, they're essentially equivalent. My advice: use Lazy<> because it's easy and move on to harder problems.
The Lazy class does all the thread-safety work for you, which is the sort of thing that is a lot more complicated than it sounds to implement by hand.

Is it a code smell for one method to depend on another?

I am refactoring a class so that the code is testable (using NUnit and RhinoMocks as testing and isolations frameworks) and have found that I have found myself with a method is dependent on another (i.e. it depends on something which is created by that other method). Something like the following:
public class Impersonator
{
private ImpersonationContext _context;
public void Impersonate()
{
...
_context = GetContext();
...
}
public void UndoImpersonation()
{
if (_context != null)
_someDepend.Undo();
}
}
Which means that to test UndoImpersonation, I need to set it up by calling Impersonate (Impersonate already has several unit tests to verify its behaviour). This smells bad to me but in some sense it makes sense from the point of view of the code that calls into this class:
public void ExerciseClassToTest(Impersonator c)
{
try
{
if (NeedImpersonation())
{
c.Impersonate();
}
...
}
finally
{
c.UndoImpersonation();
}
}
I wouldn't have worked this out if I didn't try to write a unit test for UndoImpersonation and found myself having to set up the test by calling the other public method. So, is this a bad smell and if so how can I work around it?
Code smell has got to be one of the most vague terms I have ever encountered in the programming world. For a group of people that pride themselves on engineering principles, it ranks right up there in terms of unmeasurable rubbish, and about as useless a measure, as LOCs per day for programmer efficiency.
Anyway, that's my rant, thanks for listening :-)
To answer your specific question, I don't believe this is a problem. If you test something that has pre-conditions, you need to ensure the pre-conditions have been set up first for the given test case.
One of the tests should be what happens when you call it without first setting up the pre-conditions - it should either fail gracefully or set up it's own pre-condition if the caller hasn't bothered to do so.
Well, there is a bit too little context to tell, it looks like _someDepend should be initalized in the constructor.
Initializing fields in an instance method is a big NO for me. A class should be fully usable (i.e. all methods work) as soon as it is constructed; so the constructor(s) should initialize all instance variables. See e.g. the page on single step construction in Ward Cunningham's wiki.
The reason initializing fields in an instance method is bad is mainly that it imposes an implicit ordering on how you can call methods. In your case, TheMethodIWantToTest will do different things depending on whether DoStuff was called first. This is generally not something a user of your class would expect, so it's bad :-(.
That said, sometimes this kind of coupling may be unavoidable (e.g. if one method acquires a resource such as a file handle, and another method is needed to release it). But even that should be handled within one method if possible.
What applies to your case is hard to tell without more context.
Provided you don't consider mutable objects a code smell by themselves, having to put an object into the state needed for a test is simply part of the set-up for that test.
This is often unavoidable, for instance when working with remote connections - you have to call Open() before you can call Close(), and you don't want Open() to automatically happen in the constructor.
However you want to be very careful when doing this that the pattern is something readily understood - for instance I think most users accept this kind of behaviour for anything transactional, but might be surprised when they encounter DoStuff() and TheMethodIWantToTest() (whatever they're really called).
It's normally best practice to have a property that represents the current state - again look at remote or DB connections for an example of a consistently understood design.
The big no-no is for this to ever happen for properties. Properties should never care what order they are called in. If you have a simple value that does depend on the order of methods then it should be a parameterless method instead of a property-get.
Yes, I think there is a code smell in this case. Not because of dependencies between methods, but because of the vague identity of the object. Rather than having an Impersonator which can be in different persona states, why not have an immutable Persona?
If you need a different Persona, just create a new one rather than changing the state of an existing object. If you need to do some cleanup afterwards, make Persona disposable. You can keep the Impersonator class as a factory:
using (var persona = impersonator.createPersona(...))
{
// do something with the persona
}
To answer the title: having methods call each other (chaining) is unavoidable in object oriented programming, so in my view there is nothing wrong with testing a method that calls another. A unit test can be a class after all, it's a "unit" you're testing.
The level of chaining depends on the design of your object - you can either fork or cascade.
Forking:
classToTest1.SomeDependency.DoSomething()
Cascading:
classToTest1.DoSomething() (which internally would call SomeDependency.DoSomething)
But as others have mentioned, definitely keep your state initialisation in the constructor which from what I can tell, will probably solve your issue.

Categories

Resources