Whose responsibility is it to cache / memoize function results?

Whose responsibility is it to cache / memoize function results? - c#

I'm working on software which allows the user to extend a system by implementing a set of interfaces.
In order to test the viability of what we're doing, my company "eats its own dog food" by implementing all of our business logic in these classes in the exact same way a user would.
We have some utility classes / methods that tie everything together and use the logic defined in the extendable classes.
I want to cache the results of the user-defined functions. Where should I do this?
Is it the classes themselves? This seems like it can lead to a lot of code duplication.
Is it the utilities/engine which uses these classes? If so, an uninformed user may call the class function directly and not receive any caching benefit.
Example code
public interface ILetter { string[] GetAnimalsThatStartWithMe(); }
public class A : ILetter { public string[] GetAnimalsThatStartWithMe()
{
return new [] { "Aardvark", "Ant" };
}
}
public class B : ILetter { public string[] GetAnimalsThatStartWithMe()
{
return new [] { "Baboon", "Banshee" };
}
}
/* ...Left to user to define... */
public class Z : ILetter { public string[] GetAnimalsThatStartWithMe()
{
return new [] { "Zebra" };
}
}
public static class LetterUtility
{
public static string[] GetAnimalsThatStartWithLetter(char letter)
{
if(letter == 'A') return (new A()).GetAnimalsThatStartWithMe();
if(letter == 'B') return (new B()).GetAnimalsThatStartWithMe();
/* ... */
if(letter == 'Z') return (new Z()).GetAnimalsThatStartWithMe();
throw new ApplicationException("Letter " + letter + " not found");
}
}
Should LetterUtility be responsible for caching? Should each individual instance of ILetter? Is there something else entirely that can be done?
I'm trying to keep this example short, so these example functions don't need caching. But consider I add this class that makes (new C()).GetAnimalsThatStartWithMe() take 10 seconds every time it's run:
public class C : ILetter
{
public string[] GetAnimalsThatStartWithMe()
{
Thread.Sleep(10000);
return new [] { "Cat", "Capybara", "Clam" };
}
}
I find myself battling between making our software as fast as possible and maintaining less code (in this example: caching the result in LetterUtility) and doing the exact same work over and over (in this example: waiting 10 seconds every time C is used).

Which layer is best responsible for caching of the results of these user-definable functions?
The answer is pretty obvious: the layer that can correctly implement the desired cache policy is the right layer.
A correct cache policy needs to have two characteristics:
It must never serve up stale data; it must know whether the method being cached is going to produce a different result, and invalidate the cache at some point before the caller would get stale data
It must manage cached resources efficiently on the user's behalf. A cache without an expiration policy that grows without bounds has another name: we usually call them "memory leaks".
What's the layer in your system that knows the answers to the questions "is the cache stale?" and "is the cache too big?" That's the layer that should implement the cache.

Something like caching can be considered a "cross-cutting" concern (http://en.wikipedia.org/wiki/Cross-cutting_concern):
In computer science, cross-cutting concerns are aspects of a program which affect other concerns. These concerns often cannot be cleanly decomposed from the rest of the system in both the design and implementation, and can result in either scattering (code duplication), tangling (significant dependencies between systems), or both.
For instance, if writing an application for handling medical records, the bookkeeping and indexing of such records is a core concern, while logging a history of changes to the record database or user database, or an authentication system, would be cross-cutting concerns since they touch more parts of the program.
Cross cutting concerns can often be implemented via Aspect Oriented Programming (http://en.wikipedia.org/wiki/Aspect-oriented_programming).
In computing, aspect-oriented programming (AOP) is a programming paradigm which aims to increase modularity by allowing the separation of cross-cutting concerns. AOP forms a basis for aspect-oriented software development.
There are many tools in .NET to facilitate Aspect Oriented Programming. I'm most fond of those that provide completely transparent implementation. In the example of caching:
public class Foo
{
[Cache(10)] // cache for 10 minutes
public virtual void Bar() { ... }
}
That's all you need to do...everything else happens automatically by defining a behavior like so:
public class CachingBehavior
{
public void Intercept(IInvocation invocation) { ... }
// this method intercepts any method invocations on methods attributed with the [Cache] attribute.
// In the case of caching, this method would check if some cache store contains the data, and if it does return it...else perform the normal method operation and store the result
}
There are two general schools for how this happens:
Post build IL weaving. Tools like PostSharp, Microsoft CCI, and Mono Cecil can be configured to automatically rewrite these attributed methods to automatically delegate to your behaviors.
Runtime proxies. Tools like Castle DynamicProxy and Microsoft Unity can automatically generate proxy types (a type derived from Foo that overrides Bar in the example above) that delegates to your behavior.

Although I do not know C#, this seems like a case for using AOP (Aspect-Oriented Programming). The idea is that you can 'inject' code to be executed at certain points in the execution stack.
You can add the caching code as follows:
IF( InCache( object, method, method_arguments ) )
RETURN Cache(object, method, method_arguments);
ELSE
ExecuteMethod(); StoreResultsInCache();
You then define that this code should be executed before every call of your interface functions (and all subclasses implementing these functions as well).
Can some .NET expert enlighten us how you would do this in .NET ?

In general, caching and memoisation makes sense when:
Obtaining the result is (or at least can be) high-latency or otherwise expensive than the expense caused by caching itself.
The results have a look-up pattern where there will be frequent calls with the same inputs to the function (that is, not just the arguments but any instance, static and other data that affects the result).
There isn't an already existing caching mechanism within the code the code in question calls into that makes this unnecessary.
There won't be another caching mechanism within the code that calls the code in question that makes this unnecessary (why it almost never makes sense to memoise GetHashCode() within that method, despite people often being tempted to when the implementation is relatively expensive).
Is impossible to become stale, unlikely to become stale while the cache is loaded, unimportant if it becomes stale, or where staleness is easy to detect.
There are cases where every use-case for a component will match all of these. There are many more where they will not. For example, if a component caches results but is never called twice with the same inputs by a particular client component, then that caching is just a waste that has had a negative impact upon performance (maybe negligible, maybe severe).
More often it makes much more sense for the client code to decide upon the caching policy that would suit it. It will also often be easier to tweak for a particular use at this point in the face of real-world data than in the component (since the real-world data it'll face could vary considerably from use to use).
It's even harder to know what degree of staleness could be acceptable. Generally, a component has to assume that 100% freshness is required from it, while the client component can know that a certain amount of staleness will be fine.
On the other hand, it can be easier for a component to obtain information that is of use to the cache. Components can work hand-in-hand in these cases, though it is much more involved (an example would be the If-Modified-Since mechanism used by RESTful webservices, where a server can indicate that a client can safely use information it has cached).
Also, a component can have a configurable caching policy. Connection pooling is a caching policy of sorts, consider how that's configurable.
So in summary:
The component that can work out what caching is both possible and useful.
Which is most often the client code. Though having details of likely latency and staleness documented by the component's authors will help here.
Can less often be the client code with help from the component, though you have to expose details of the caching to allow that.
And can sometimes be the component with the caching policy configurable by the calling code.
Can only rarely be the component, because it's rarer for all possible use-cases to be served well by the same caching policy. One important exception is where the same instance of that component will serve multiple clients, because then the factors that affect the above are spread over those multiple clients.

All of the previous posts brought up some good points, here is a very rough outline of a way you might do it. I wrote this up on the fly so it might need some tweaking:
interface IMemoizer<T, R>
{
bool IsValid(T args); //Is the cache valid, or stale, etc.
bool TryLookup(T args, out R result);
void StoreResult(T args, R result);
}
static IMemoizerExtensions
{
Func<T, R> Memoizing<T, R>(this IMemoizer src, Func<T, R> method)
{
return new Func<T, R>(args =>
{
R result;
if (src.TryLookup(args, result) && src.IsValid(args))
{
return result;
}
else
{
result = method.Invoke(args);
memoizer.StoreResult(args, result);
return result;
}
});
}
}

Related

Does an IO monad make sense in a language like C#

After spending a lot of time reading and thinking, I think I have finally grasped what monads are, how they work, and what they're useful for. My main goal was to figure out if monads were something I could apply to my daily work in C#.
When I started learning about monads, I got the impression that they are magical, and that they somehow make IO and other non-pure functions pure.
I understand the importance of monads for things like LINQ in .Net, and Maybe is very useful for dealing with functions that don't return valid values. And I also appreciate the need to limit statefulness in code and to isolate external dependencies, and I was hoping monads would help with those too.
But I've finally come to the conclusion that monads for IO and handling state are a necessity for Haskell, because Haskell has no other way to do it (otherwise, you couldn't guarantee sequencing, and some calls would be optimized away.) But for more mainstream languages, monads are not a good fit for these needs, since most languages already handle and state and IO easily.
So, my question is, is it fair to say that the IO monad is really only useful in Haskell? Is there a good reason to implement an IO monad in, say, C#?

At work, we use monads to control IO in our C# code on our most important pieces of business logic. Two examples are our financial code and code that finds solutions to an optimization problem for our customers.
In our financial code, we use a monad to control IO writing to and reading from our database. It essentially consists of a small set of operations and an abstract syntax tree for the monad operations. You could imagine it's something like this (not actual code):
interface IFinancialOperationVisitor<T, out R> : IMonadicActionVisitor<T, R> {
R GetTransactions(GetTransactions op);
R PostTransaction(PostTransaction op);
}
interface IFinancialOperation<T> {
R Accept<R>(IFinancialOperationVisitor<T, R> visitor);
}
class GetTransactions : IFinancialOperation<IError<IEnumerable<Transaction>>> {
Account Account {get; set;};
public R Accept<R>(IFinancialOperationVisitor<R> visitor) {
return visitor.Accept(this);
}
}
class PostTransaction : IFinancialOperation<IError<Unit>> {
Transaction Transaction {get; set;};
public R Accept<R>(IFinancialOperationVisitor<R> visitor) {
return visitor.Accept(this);
}
}
which is essentially the Haskell code
data FinancialOperation a where
GetTransactions :: Account -> FinancialOperation (Either Error [Transaction])
PostTransaction :: Transaction -> FinancialOperation (Either Error Unit)
along with an abstract syntax tree for the construction of actions in a monad, essentially the free monad:
interface IMonadicActionVisitor<in T, out R> {
R Return(T value);
R Bind<TIn>(IMonadicAction<TIn> input, Func<TIn, IMonadicAction<T>> projection);
R Fail(Errors errors);
}
// Objects to remember the arguments, and pass them to the visitor, just like above
/*
Hopefully I got the variance right on everything for doing this without higher order types,
which is how we used to do this. We now use higher order types in c#, more on that below.
Here, to avoid a higher-order type, the AST for monadic actions is included by inheritance
in
*/
In the real code, there are more of these so we can remember that something was built by .Select() instead of .SelectMany() for efficiency. A financial operation, including intermediary computations still has type IFinancialOperation<T>. The actual performance of the operations is done by an interpreter, which wraps all the database operations in a transaction and deals with how to roll that transaction back if any component is unsuccessful. We also use a interpreter for unit testing the code.
In our optimization code, we use a monad for controlling IO to get external data for optimization. This allows us to write code that is ignorant of how computations are composed, which lets us use exactly the same business code in multiple settings:
synchronous IO and computations for computations done on demand
asynchronous IO and computations for many computations done in parallel
mocked IO for unit tests
Since the code needs to be passed which monad to use, we need an explicit definition of a monad. Here's one. IEncapsulated<TClass,T> essentially means TClass<T>. This lets the c# compiler keep track of all three pieces of the type of monads simultaneously, overcoming the need to cast when dealing with monads themselves.
public interface IEncapsulated<TClass,out T>
{
TClass Class { get; }
}
public interface IFunctor<F> where F : IFunctor<F>
{
// Map
IEncapsulated<F, B> Select<A, B>(IEncapsulated<F, A> initial, Func<A, B> projection);
}
public interface IApplicativeFunctor<F> : IFunctor<F> where F : IApplicativeFunctor<F>
{
// Return / Pure
IEncapsulated<F, A> Return<A>(A value);
IEncapsulated<F, B> Apply<A, B>(IEncapsulated<F, Func<A, B>> projection, IEncapsulated<F, A> initial);
}
public interface IMonad<M> : IApplicativeFunctor<M> where M : IMonad<M>
{
// Bind
IEncapsulated<M, B> SelectMany<A, B>(IEncapsulated<M, A> initial, Func<A, IEncapsulated<M, B>> binding);
// Bind and project
IEncapsulated<M, C> SelectMany<A, B, C>(IEncapsulated<M, A> initial, Func<A, IEncapsulated<M, B>> binding, Func<A, B, C> projection);
}
public interface IMonadFail<M,TError> : IMonad<M> {
// Fail
IEncapsulated<M, A> Fail<A>(TError error);
}
Now we could imagine making another class of monad for the portion of IO our computations need to be able to see:
public interface IMonadGetSomething<M> : IMonadFail<Error> {
IEncapsulated<M, Something> GetSomething();
}
Then we can write code that doesn't know about how computations are put together
public class Computations {
public IEncapsulated<M, IEnumerable<Something>> GetSomethings<M>(IMonadGetSomething<M> monad, int number) {
var result = monad.Return(Enumerable.Empty<Something>());
// Our developers might still like writing imperative code
for (int i = 0; i < number; i++) {
result = from existing in r1
from something in monad.GetSomething()
select r1.Concat(new []{something});
}
return result.Select(x => x.ToList());
}
}
This can be reused in both a synchronous and asynchronous implementation of an IMonadGetSomething<>. Note that in this code, the GetSomething()s will happen one after another until there's an error, even in an asynchronous setting. (No this is not how we build lists in real life)

I use Haskell and F# regularly and I've never really felt like using an IO or state monad in F#.
The main reason for me is that in Haskell, you can tell from the type of something that it doesn't use IO or state, and that's a really valuable piece of information.
In F# (and C#) there's no such general expectation on other people's code, and so you won't benefit much from adding that discipline to your own code, and you'll pay some general overhead (mainly syntactic) for sticking to it.
Monads also don't work too well on the .NET platform because of the lack of higher-kinded types: while you can write monadic code in F# with workflow syntax, and in C# with a bit more pain, you can't easily write code that abstracts over multiple different monads.

You ask "Do we need an IO monad in C#?" but you should ask instead "Do we need a way to reliably obtain purity and immutability in C#?".
The key benefit would be controlling side-effects. Whether you do that using monads or some other mechanism doesn't matter. For example, C# could allow you to mark methods as pure and classes as immutable. That would go a great way towards taming side-effects.
In such a hypothetical version of C# you'd try to make 90% of the computation pure, and have unrestricted, eager IO and side-effects in the remaining 10%. In such a world I do not see so much of a need for absolute purity and an IO monad.
Note, that by just mechanically converting side-effecting code to a monadic style you gain nothing. The code does not improve in quality at all. You improve the code quality by being 90% pure, and concentrating the IO into small, easily reviewable places.

The ability to know if a function has side effects just by looking at its signature is very useful when trying to understand what the function does. The less a function can do, the less you have to understand! (Polymorphism is another thing that helps restrict what a function can do with its arguments.)
In many languages that implement Software Transactional Memory, the documentation has warnings like the following:
I/O and other activities with side-effects should be avoided in
transactions, since transactions will be retried.
Having that warning become a prohibition enforced by the type system can make the language safer.
There are optimizations can only be performed with code that is free of side effects. But the absence of side effects may be difficult to determine if you "allow anything" in the first place.
Another benefit of the IO monad is that, since IO actions are "inert" unless they lie in the path of the main function, it's easy to manipulate them as data, put them in containers, compose them at runtime, and so on.
Of course, the monadic approach to IO has its disadvantages. But it does have advantages besides "being one of the few ways of doing I/O in a pure lazy language in a flexible and principled manner".

As always, the IO monad is special and difficult to reason about. It's well known in the Haskell community that while IO is useful, it does not share many of the benefits other monads do. It's use is, as you've remarked, motivated greatly by its privileges position instead of it being a good modeling tool.
With that, I'd say it's not so useful in C# or, really, any language that isn't trying to completely contain side effects with type annotations.
But it's just one monad. As you've mentioned, Failure shows up in LINQ, but more sophisticated monads are useful even in a side-effecting language.
For instance, even with arbitrary global and local state environments, the state monad will indicate both the beginning and end of a regime of actions which work on some privileged kind of state. You don't get the side-effect elimination guarantees Haskell enjoys, but you still get good documentation.
To go further, introducing something like a Parser monad is a favorite example of mine. Having that monad, even in C#, is a great way to localize things like non-deterministic, backtracking failure performed while consuming a string. You can obviously do that with particular kinds of mutability, but Monads express that a particular expression performs a useful action in that effectful regime without regard to any global state you might also be involving.
So, I'd say yes, they're useful in any typed language. But IO as Haskell does it? Maybe not so much.

In a language like C# where you can do IO anywhere, an IO monad doesn't really have any practical use. The only thing you'd want to use it for is controlling side effects, and since there's nothing stopping you from performing side effects outside the monad, there's not really much point.
As for the Maybe monad, while it seems potentially useful, it only really works in a language with lazy evaluation. In the following Haskell expression, the second lookup isn't evaluated if the first returns Nothing:
doSomething :: String -> Maybe Int
doSomething name = do
x <- lookup name mapA
y <- lookup name mapB
return (x+y)
This allows the expression to "short circuit" when a Nothing is encountered. An implementation in C# would have to perform both lookups (I think, I'd be interested to see a counter-example.) You're probably better-off with if statements.
Another issue is the loss of abstraction. While it's certainly possible to implement monads in C# (or things which look a little bit like monads), you can't really generalise like you can in Haskell because C# doesn't have higher kinds. For example, a function like mapM :: Monad m => Monad m => (a -> m b) -> [a] -> m [b] (which works for any monad) can't really be represented in C#. You could certainly have something like this:
public List<Maybe<a> mapM<a,b>(Func<a, Maybe<b>>);
which would work for a specific monad (Maybe in this case), but It's not possible to abstract-away the Maybe from that function. You'd have to be able to do something like this:
public List<m<a> mapM<m,a,b>(Func<a, m<b>>);
which isn't possible in C#.

#Transactional equivalent in C# and concurrency for DDD application services

I'm reading Vaughn Vernon's book on Implementing Domain Driven design. I have also been going through the book code, C# version, from his github here.
The Java version of the book has decorators #Transactional which I believe are from the spring framework.
public class ProductBacklogItemService
{
#Transactional
public void assignTeamMemberToTask(
string aTenantId,
string aBacklogItemId,
string aTaskId,
string aTeamMemberId)
{
BacklogItem backlogItem =
backlogItemRepository.backlogItemOfId(
new TenantId(aTenantId),
new BacklogItemId(aBacklogItemId));
Team ofTeam =
teamRepository.teamOfId(
backlogItem.tennantId(),
backlogItem.teamId());
backlogItem.assignTeamMemberToTask(
new TeamMemberId(aTeamMemberId),
ofTeam,
new TaskId(aTaskId));
}
}
What would be the equivalent manual implementation in C#? I'm thinking something along the lines of:
public class ProductBacklogItemService
{
private static object lockForAssignTeamMemberToTask = new object();
private static object lockForOtherAppService = new object();
public voice AssignTeamMemberToTask(string aTenantId,
string aBacklogItemId,
string aTaskId,
string aTeamMemberId)
{
lock(lockForAssignTeamMemberToTask)
{
// application code as before
}
}
public voice OtherAppsService(string aTenantId)
{
lock(lockForOtherAppService)
{
// some other code
}
}
}
This leaves me with the following questions:
Do we lock by application service, or by repository? i.e. Should we not be doing backlogItemRepository.lock()?
When we are reading multiple repositories as part of our application service, how do we protect dependencies between repositories during transactions (where aggregate roots reference other aggregate roots by identity) - do we need to have interconnected locks between repositories?
Are there any DDD infrastructure frameworks that handle any of this locking?
Edit
Two useful answers came in to use transactions, as I haven't selected my persistence layer I am using in-memory repositories, these are pretty raw and I wrote them (they don't have transaction support as I don't know how to add!).
I will design the system so I do not need to commit to atomic changes to more than one aggregate root at the same time, I will however need to read consistently across a number of repositories (i.e. if a BacklogItemId is referenced from multiple other aggregates, then we need to protect against race conditions should BacklogItemId be deleted).
So, can I get away with just using locks, or do I need to look at adding TransactionScope support on my in-memory repository?

TL;DR version
You need to wrap your code in a System.Transactions.TransactionScope. Be careful about multi-threading btw.
Full version
So the point of aggregates is that the define a consistency boundary. That means any changes should result in the state of the aggregate still honouring it's invariants. That's not necessarily the same as a transaction. Real transactions are a cross-cutting implementation detail, so should probably be implemented as such.
A warning about locking
Don't do locking. Try and forget any notion you have of implementing pessimistic locking. To build scalable systems you have no real choice. The very fact that data takes time to be requested and get from disk to your screen means you have eventual consistency, so you should build for that. You can't really protect against race conditions as such, you just need to account for the fact they could happen and be able to warn the "losing" user that their command failed. Often you can start finding these issues later on (seconds, minutes, hours, days, whatever your domain experts tell you the SLA is) and tell users so they can do something about it.
For example, imagine if two payroll clerks paid an employee's expenses at the same time with the bank. They would find out later on when the books were being balanced and take some compensating action to rectify the situation. You wouldn't want to scale down your payroll department to a single person working at a time in order to avoid these (rare) issues.
My implementation
Personally I use the Command Processor style, so all my Application Services are implemented as ICommandHandler<TCommand>. The CommandProcessor itself is the thing looking up the correct handler and asking it to handle the command. This means that the CommandProcessor.Process(command) method can have it's entire contents processed in a System.Transactions.TransactionScope.
Example:
public class CommandProcessor : ICommandProcessor
{
public void Process(Command command)
{
using (var transaction = new TransactionScope())
{
var handler = LookupHandler(command);
handler.Handle(command);
transaction.Complete();
}
}
}
You've not gone for this approach so to make your transactions a cross-cutting concern you're going to need to move them a level higher in the stack. This is highly-dependent on the tech you're using (ASP.NET, WCF, etc) so if you add a bit more detail there might be an obvious place to put this stuff.

Locking wouldn't allow any concurrency on those code paths.
I think you're looking for a transaction scope instead.

I don't know what persistency layer you are going to use but the standard ones like ADO.NET, Entity Framework etc. support the TransactionScope semantics:
using(var tr = new TransactionScope())
{
doStuff();
tr.Complete();
}
The transaction is committed if tr.Complete() is called. In any other case it is rolled back.
Typically, the aggregate is a unit of transactional consistency. If you need the transaction to spread across multiple aggregates, then you should probably reconsider your model.

lock(lockForAssignTeamMemberToTask)
{
// application code as before
}
This takes care of synchronization. However, you also need to revert the changes in case of any exception. So, the pattern will be something like:
lock(lockForAssignTeamMemberToTask)
{
try {
// application code as before
} catch (Exception e) {
// rollback/restore previous values
}
}

thoughts on configuration through delegates

i'm working on a fork of the Divan CouchDB library, and ran into a need to set some configuration parameters on the httpwebrequest that's used behind the scenes. At first i started threading the parameters through all the layers of constructors and method calls involved, but then decided - why not pass in a configuration delegate?
so in a more generic scenario,
given :
class Foo {
private parm1, parm2, ... , parmN
public Foo(parm1, parm2, ... , parmN) {
this.parm1 = parm1;
this.parm2 = parm2;
...
this.parmN = parmN;
}
public Bar DoWork() {
var r = new externallyKnownResource();
r.parm1 = parm1;
r.parm2 = parm2;
...
r.parmN = parmN;
r.doStuff();
}
}
do:
class Foo {
private Action<externallyKnownResource> configurator;
public Foo(Action<externallyKnownResource> configurator) {
this.configurator = configurator;
}
public Bar DoWork() {
var r = new externallyKnownResource();
configurator(r);
r.doStuff();
}
}
the latter seems a lot cleaner to me, but it does expose to the outside world that class Foo uses externallyKnownResource
thoughts?

This can lead to cleaner looking code, but has a huge disadvantage.
If you use a delegate for your configuration, you lose a lot of control over how the objects get configured. The problem is that the delegate can do anything - you can't control what happens here. You're letting a third party run arbitrary code inside of your constructors, and trusting them to do the "right thing." This usually means you end up having to write a lot of code to make sure that everything was setup properly by the delegate, or you can wind up with very brittle, easy to break classes.
It becomes much more difficult to verify that the delegate properly sets up each requirement, especially as you go deeper into the tree. Usually, the verification code ends up much messier than the original code would have been, passing parameters through the hierarchy.

I may be missing something here, but it seems like a big disadvantage to create the externallyKnownResource object down in DoWork(). This precludes easy substitution of an alternate implementation.
Why not:
public Bar DoWork( IExternallyKnownResource r ) { ... }

IMO, you're best off accepting a configuration object as a single parameter to your Foo constructor, rather than a dozen (or so) separate parameters.
Edit:
there's no one-size-fits-all solution, no. but the question is fairly simple. i'm writing something that consumes an externally known entity (httpwebrequest) that's already self-validating and has a ton of potentially necessary parameters. my options, really, are to re-create almost all of the configuration parameters this has, and shuttle them in every time, or put the onus on the consumer to configure it as they see fit. – kolosy
The problem with your request is that in general it is poor class design to make the user of the class configure an external resource, even if it's a well-known or commonly used resource. It is better class design to have your class hide all of that from the user of your class. That means more work in your class, yes, passing configuration information to your external resource, but that's the point of having a separate class. Otherwise why not just have the caller of your class do all the work on your external resource? Why bother with a separate class in the first place?
Now, if this is an internal class doing some simple utility work for another class that you will always control, then you're fine. But don't expose this type of paradigm publicly.

Method knows too much about methods it's calling

I have a method that I want to be "transactional" in the abstract sense. It calls two methods that happen to do stuff with the database, but this method doesn't know that.
public void DoOperation()
{
using (var tx = new TransactionScope())
{
Method1();
Method2();
tc.Complete();
}
}
public void Method1()
{
using (var connection = new DbConnectionScope())
{
// Write some data here
}
}
public void Method2()
{
using (var connection = new DbConnectionScope())
{
// Update some data here
}
}
Because in real terms the TransactionScope means that a database transaction will be used, we have an issue where it could well be promoted to a Distributed Transaction, if we get two different connections from the pool.
I could fix this by wrapping the DoOperation() method in a ConnectionScope:
public void DoOperation()
{
using (var tx = new TransactionScope())
using (var connection = new DbConnectionScope())
{
Method1();
Method2();
tc.Complete();
}
}
I made DbConnectionScope myself for just such a purpose, so that I don't have to pass connection objects to sub-methods (this is more contrived example than my real issue). I got the idea from this article: http://msdn.microsoft.com/en-us/magazine/cc300805.aspx
However I don't like this workaround as it means DoOperation now has knowledge that the methods it's calling may use a connection (and possibly a different connection each). How could I refactor this to resolve the issue?
One idea I'm thinking of is creating a more general OperationScope, so that when teamed up with a custom Castle Windsor lifestyle I'll write, will mean any component requested of the container with OperationScopeLifetyle will always get the same instance of that component. This does solve the problem because OperationScope is more ambiguous than DbConnectionScope.

I'm seeing conflicting requirements here.
On the one hand, you don't want DoOperation to have any awareness of the fact that a database connection is being used for its sub-operations.
On the other hand, it clearly is aware of this fact because it uses a TransactionScope.
I can sort of understand what you're getting at when you say you want it to be transactional in the abstract sense, but my take on this is that it's virtually impossible (no, scratch that - completely impossible) to describe a transaction in such abstract terms. Let's just say you have a class like this:
class ConvolutedBusinessLogic
{
public void Splork(MyWidget widget)
{
if (widget.Validate())
{
widgetRepository.Save(widget);
widget.LastSaved = DateTime.Now;
OnSaved(new WidgetSavedEventArgs(widget));
}
else
{
Log.Error("Could not save MyWidget due to a validation error.");
SendEmailAlert(new WidgetValidationAlert(widget));
}
}
}
This class is doing at least two things that probably can't be rolled back (setting the property of a class and executing an event handler, which might for example cascade-update some controls on a form), and at least two more things that definitely can't be rolled back (appending to a log file somewhere and sending out an e-mail alert).
Perhaps this seems like a contrived example, but that is actually my point; you can't treat a TransactionScope as a "black box". The scope is in fact a dependency like any other; TransactionScope just provides a convenient abstraction for a unit of work that may not always be appropriate because it doesn't actually wrap a database connection and can't predict the future. In particular, it's normally not appropriate when a single logical operation needs to span more than one database connection, whether those connections are to the same database or different ones. It tries to handle this case of course, but as you've already learned, the result is sub-optimal.
The way I see it, you have a few different options:
Make explicit the fact that Method1 and Method2 require a connection by having them take a connection parameter, or by refactoring them into a class that takes a connection dependency (constructor or property). This way, the connection becomes part of the contract, so Method1 no longer knows too much - it knows exactly what it's supposed to know according to the design.
Accept that your DoOperation method does have an awareness of what Method1 and Method2 do. In fact, there is nothing wrong with this! It's true that you don't want to be relying on implementation details of some future call, but forward dependencies in the abstraction are generally considered OK; it's reverse dependencies you need to be concerned about, like when some class deep in the domain model tries to update a UI control that it has no business knowing about in the first place.
Use a more robust Unit of Work pattern (also: here). This is getting to be more popular and it is, by and large, the direction Microsoft has gone in with Linq to SQL and EF (the DataContext/ObjectContext are basically UOW implementations). This sleeves in well with a DI framework and essentially relieves you of the need to worry about when transactions start and end and how the data access has to occur (the term is "persistence ignorance"). This would probably require significant rework of your design, but pound for pound it's going to be the easiest to maintain long-term.
Hope one of those helps you.

Using delegates instead of interfaces for decoupling. Good idea?

When writing GUI apps I use a top level class that "controls" or "coordinates" the application. The top level class would be responsible for coordinating things like initialising network connections, handling application wide UI actions, loading configuration files etc.
At certain stages in the GUI app control is handed off to a different class, for example the main control swaps from the login screen to the data entry screen once the user authenticates. The different classes need to use functionality of objects owned by the top level control. In the past I would simply pass the objects to the subordinate controls or create an interface. Lately I have changed to passing method delegates instead of whole objects with the two main reasons being:
It's a lot easier to mock a method than a class when unit testing,
It makes the code more readable by documenting in the class constructor exactly which methods subordinate classes are using.
Some simplified example code is below:
delegate bool LoginDelegate(string username, string password);
delegate void UpdateDataDelegate(BizData data);
delegate void PrintDataDelegate(BizData data);
class MainScreen {
private MyNetwork m_network;
private MyPrinter m_printer;
private LoginScreen m_loginScreen;
private DataEntryScreen m_dataEntryScreen;
public MainScreen() {
m_network = new Network();
m_printer = new Printer();
m_loginScreen = new LoginScreen(m_network.Login);
m_dataEntryScreen = new DataEntryScreen(m_network.Update, m_printer.Print);
}
}
class LoginScreen {
LoginDelegate Login_External;
public LoginScreen(LoginDelegate login) {
Login_External = login
}
}
class DataEntryScreen {
UpdateDataDelegate UpdateData_External;
PrintDataDelegate PrintData_External;
public DataEntryScreen(UpdateDataDelegate updateData, PrintDataDelegate printData) {
UpdateData_External = updateData;
PrintData_External = printData;
}
}
My question is that while I prefer this approach and it makes good sense to me how is the next developer that comes along going to find it? In sample and open source C# code interfaces are the preferred approach for decoupling whereas this approach of using delegates leans more towards functional programming. Am I likely to get the subsequent developers swearing under their breath for what is to them a counter-intuitive approach?

It's an interesting approach. You may want to pay attention to two things:
Like Philip mentioned, when you have a lot of methods to define, you will end up with a big constructor. This will cause deep coupling between classes. One more or one less delegate will require everyone to modify the signature. You should consider making them public properties and using some DI framework.
Breaking down the implementation to the method level can be too granular sometimes. With class/interface, you can group methods by the domain/functionality. If you replace them with delegates, they can be mixed up and become difficult to read/maintain.
It seems the number of delegates is an important factor here.

While I can certainly see the positive side of using delegates rather than an interface, I have to disagree with both of your bullet points:
"It's a lot easier to mock a method than a class when unit testing". Most mock frameworks for c# are built around the idea of mocking a type. While many can mock methods, the samples and documentation (and focus) are normally around types. Mocking an interface with one method is just as easy or easier to mock than a method.
"It makes the code more readable by documenting in the class constructor exactly which methods subordinate classes are using." Also has it's cons - once a class needs multiple methods, the constructors get large; and once a subordinate class needs a new property or method, rather than just modifying the interface you must also add it to allthe class constructors up the chain.
I'm not saying this is a bad approach by any means - passing functions rather than types does clearly state what you are doing and can reduce your object model complexity. However, in c# your next developer will probably see this as odd or confusing (depending on skill level). Mixing bits of OO and Functional approaches will probably get a raised eyebrow at the very least from most developers you will work with.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.