I'll begin this question with apologizing for the length of the post. So that I save you some time, my problem is that the class pattern I've got stuck in my head is obviously flawed, and I can't see a good solution.
In a project I'm working on, I need to use operate algorithms on a chunks of data, let's call them DataCache. Sometimes these algorithms return results that themselves need to be cached, and so I devised a scheme.
I have an Algorithm base class that looks like so
abstract class Algorithm<T>
{
protected abstract T ExecuteAlgorithmLogic(DataCache dataCache);
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
public T ComputeResult(DataCache dataCache, bool save = false)
{
if (_resultsStrong.ContainsKey(dataCache))
return _resultsStrong[dataCache];
if (_resultsWeak.ContainsKey(dataCache))
{
var temp = _resultsWeak[dataCache].Target;
if (temp != null) return (T) temp;
}
var result = ExecuteAlgorithmLogic(dataCache);
_resultsWeak[dataCache] = new WeakReference(result, true);
if (save) _resultsStrong[dataCache] = result;
return result;
}
}
If you call ComputeResult() and provide a DataCache you can optionally select to cache the result. Also, if you are lucky result still might be there if the GC hasn't collected it yet. The size of each DataCache is in hundreds of megabytes, and before you ask there are about 10 arrays in each, which hold basic types such as int and float.
My idea here was that an actual algorithm would look something like this:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//Elves be here
}
}
And I would define tens of .cs files, each for one algorithm. There are two problems with this approach. Firstly, in order for this to work, I need to instantiate my algorithms and keep that instance (or the results are not cached and the entire point is mute). But then I end up with an unsightly singleton pattern implementation in each derived class. It would look something like so:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//Elves and dragons be here
}
protected ActualAgorithm(){ }
private static ActualAgorithm _instance;
public static ActualAgorithm Instance
{
get
{
_instance = _instance ?? new ActualAgorithm();
return _instance;
}
}
}
So in each implementation I would have to duplicate code for the singleton pattern. And secondly tens of CS files also sounds a bit overkill, since what I'm really after is just a single function returning some results that can be cached for various DataCache objects. Surely there must be a smarter way of doing this, and I would greatly appreciate a nudge in the right direction.
What I meant with my comment was something like this:
abstract class BaseClass<K,T> where T : BaseClass<K,T>, new()
{
private static T _instance;
public static T Instance
{
get
{
_instance = _instance ?? new T();
return _instance;
}
}
}
class ActualClass : BaseClass<int, ActualClass>
{
public ActualClass() {}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine(ActualClass.Instance.GetType().ToString());
Console.ReadLine();
}
}
The only problem here is that you'll have a public constructor.
I refined my previous answer but as it is rather different than the other approach I proposed, I thought I might just make another answer. First, we'll need to declare some interfaces:
// Where to find cached data
interface DataRepository {
void cacheData(Key k, Data d);
Data retrieveData(Key k, Data d);
};
// If by any chance we need an algorithm somewhere
interface AlgorithmRepository {
Algorithm getAlgorithm(Key k);
}
// The algorithm that process data
interface Algorithm {
void processData(Data in, Data out);
}
Given these interfaces, we can define some basic implementation for the algorithm repository:
class BaseAlgorithmRepository {
// The algorithm dictionnary
Map<Key, Algorithm> algorithms;
// On init, we'll build our repository using this function
void setAlgorithmForKey(Key k, Algorithm a) {
algorithms.put(k, a);
}
// ... implement the other function of the interface
}
Then we can also implement something for the DataRepository
class DataRepository {
AlgorithmRepository algorithmRepository;
Map<Key, Data> cache;
void cacheData(Key k, Data d) {
cache.put(k, d);
}
Data retrieveData(Key k, Data in) {
Data d = cache.get(k);
if (d==null) {
// Data not found in the cache, then we try to produce it ourself
Data d = new Data();
Algorithm a = algorithmRepository.getAlgorithm(k);
a.processData(in, d);
// This is optional, you could simply throw an exception to say that the
// data has not been cached and thus, the algorithm succession did not
// produce the necessary data. So instead of the above, you could simply:
// throw new DataNotCached(k);
// and thus halt the whole processing
}
return d;
}
}
Finally, we get to implement algorithms:
abstract class BaseAlgorithm {
DataRepository repository;
}
class SampleNoCacheAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// do something with in to compute out
}
}
class SampleCacheProducerAlgorithm extends BaseAlgorithm {
static Key KEY = "SampleCacheProducerAlgorithm.myKey";
void processData(Data in, Data out) {
// do something with in to compute out
// then call repository.cacheData(KEY, out);
}
}
class SampleCacheConsumerAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// Data tmp = repository.retrieveData(SampleCacheProducerAlgorithm.KEY, in);
// do something with in and tmp to compute out
}
}
To build on this, I think you could also define some special kinds of algorithms that are just in fact composites of other algorithms but also implement the Algorithm interface. An example could be:
class AlgorithmChain extends BaseAlgorithm {
List<Algorithms> chain;
void processData(Data in, Data out) {
Data currentIn = in;
foreach (Algorithm a : chain) {
Data currentOut = new Data();
a.processData(currentIn, currentOut);
currentIn = currentOut;
}
out = currentOut;
}
}
One addition I would make to this is a DataPool, that would allow you to reuse exisiting but unused Data objects in order to avoid allocating lots of memory each time you make a new Data().
I think this set of classes could give a good basis to your whole architecture, with the additional benefit that it does not employ any Singleton (always passing references to the concerned objects). Which means also that implementing dummy classes for unit tests would be rather easy.
You could have your algorithms independant of their results:
class Engine<T> {
Map<AlgorithmKey, Algorithm<T>> algorithms;
Map<AlgorithmKey, Data> algorithmsResultCache;
T processData(Data in);
}
interface Algorithm<T> {
boolean doesResultNeedsToBeCached();
T processData(Data in);
}
Then you Engine is responsible for instanciating the algorithms which are only pieces of code where the input is data and the output is either null or some data. Each algorithm can say whether his result needs to be cached or not.
In order to refine my answer, I think you should give some precisions about how the algorithms are to be run (is there an order, is it user adjustable, do we know in advance the algorithms that will be run, ...).
Can you register your algorithm instances with a combined repository/factory of algorithms that'll keep references to them? The repository could be a singleton, and, if you give the repository control of algorithm instantiation, you could use it to ensure that only one instance of each existed.
public class AlgorithmRepository
{
//... use boilerplate singleton code
public void CreateAlgorithm(Algorithms algorithm)
{
//... add to some internal hash or map, checking that it hasn't been created already
//... Algorithms is just an enum telling it which to create (clunky factory
// implementation)
}
public void ComputeResult(Algorithms algorithm, DataCache datacache)
{
// Can lazy load algoirthms here and make CreateAlgorithm private ..
CreateAlgorithm(algorithm);
//... compute and return.
}
}
This said, having a separate class (and cs file) for each algorithm makes sense to me. You could break with convention and have multiple algo classes in a single cs file if they're lightweight and it makes it easier to manage if you're worried about the number of files -- there are worse things to do. FWIW I'd just put up with the number of files ...
Typically when you create a Singleton class you don't want to inherit from it. When you do this you lose some of the goodness of the Singleton pattern (and what I hear from the pattern zealots is that an angel loses its wings every time you do something like this). But lets be pragmatic...sometimes you do what you have to do.
Regardless I do not think combining generics and inheritance will work in this instance anyway.
You indicated the number of algorithms will be in the tens (not hundreds). As long is this is the case I would create a dictionary keyed off of System.Type and store references to your methods as the values of the dictionary. In this case I used
Func<DataCache, object> as the dictionary value signature.
When the class instantiates for the first time register all your available algorithms in the dictionary. At runtime when the class needs to execute an algorithm for type T it will get the Type of T and look up the alogorithm in the dictionary.
If the code for the algorithms will be relatively involved I would suggest splitting them off into partial classes just to keep your code readable.
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeType(DataCache dataCache)
{
return new SomeType();
}
}
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeOtherType(DataCache dataCache)
{
return new SomeOtherType();
}
}
public sealed partial class Algorithm<T>
{
private readonly Dictionary<System.Type, Func<DataCache, object>> _algorithms = new Dictionary<System.Type, Func<DataCache, object>>();
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
private Algorithm() { }
private static Algorithm<T> _instance;
public static Algorithm<T> Instance
{
get
{
if (_instance == null)
{
_instance = new Algorithm<T>();
_instance._algorithms.Add(typeof(SomeType), ExecuteForSomeType);
_instance._algorithms.Add(typeof(SomeOtherType), ExecuteForSomeOtherType);
}
return _instance;
}
}
public T ComputeResult(DataCache dataCache, bool save = false)
{
T returnValue = (T)(new object());
if (_resultsStrong.ContainsKey(dataCache))
{
returnValue = _resultsStrong[dataCache];
return returnValue;
}
if (_resultsWeak.ContainsKey(dataCache))
{
returnValue = (T)_resultsWeak[dataCache].Target;
if (returnValue != null) return returnValue;
}
returnValue = (T)_algorithms[returnValue.GetType()](dataCache);
_resultsWeak[dataCache] = new WeakReference(returnValue, true);
if (save) _resultsStrong[dataCache] = returnValue;
return returnValue;
}
}
First off, I'd suggest you rename DataCache to something like DataInput for more clarity, because it's easy to confuse it with objects that really act as caches (_resultsWeak and _resultsStrong) to store the results.
Concerning the need for these caches to remain in memory for future use, maybe you should consider placing them in one of the wider scopes that exist in a .NET application than the object scope, Application or Session for example.
You could also use an AlgorithmLocator (see ServiceLocator pattern) as a single point of access to all Algorithms to get rid of the singleton logic duplication in each Algorithm.
Other than that, I find your solution to be a nice one globally. Whether or not it is overkill will basically depend on the homogeneity of your algorithms. If they all have the same way of caching data, of returning their results... it will be a great benefit to have all that logic factored out in a single place. But we lack context here to judge.
Encapsulating the caching logic in a specific object held by the Algorithm (CachingStrategy ?) would also be an alternative to inheriting it, but maybe a bit awkward since the caching object would have to access the cache before and after calculation and would need to be able to trigger algorithm calculation itself and have a hand on the results.
[Edit] if you're concerned with having one .cs file per algorithm, you can always group all Algorithm classes pertaining to a particular T in the same file.
Related
I am doing a refactor over certain code.
We have a list of investors with amounts assigned to each. The total of amounts should be equal to another total, but sometimes there are a couple of cents of difference, so we use different algorithms to assign these differences to each investor.
The current code is something like this:
public void Round(IList<Investors> investors, Enum algorithm, [here goes a list of many parameters]) {
// some checks and logic here - OMMITED FOR BREVITY
// pick method given algorithm Enum
if (algoritm == Enum.Algorithm1) {
SomeStaticClass.Algorithm1(investors, remainders, someParameter1, someParameter2, someParameter3, someParameter4)
} else if (algoritm == Enum.Algorithm2) {
SomeStaticClass.Algorithm2(investors, remainders, someParameter3)
}
}
so far we only have two algorithms. I have to implement the third one. I was given the possibility to refactor both existing implementations as well as do some generic code to make this function for future algorithms, maybe custom to each client.
My first thought was "ok, this is a strategy pattern". But the problem I see is that both algorithms receive a different parameter list (except for the first two). And future algorithms can receive a different list of parameters as well. The only thing in "common" is the investor list and the remainders.
How can I design this so I have a cleaner interface?
I thought of
Establishing an interface with ALL possible parameters, and share it
among all implementations.
Using an object with all possible parameters as properties, and use that generic object as part of the interface. I
would have 3 parameters: The list of investors, the remainders object, and a "parameters" object. But in this case, I have a similar problem. To instantiate each object and fill the required properties depends on the algorithm (unless I set all of them). I
would have to use a factory (or something) to instantiate it, using all parameters in the interface, am I right? I would be moving the problem of too many parameters to that "factory" or whatever.
Using a dynamic object instead of a statically typed object. Still
presents the same problems as before, the instantiation
I also thought of using the Visitor Pattern, but as I understand, that would be the case if I had different algorithms for different entities to use, like, another class of investors. So I don't think it is the right approach.
So far the one that convinces me the most is the second, although I am still a bit reticent about it.
Any ideas?
Thanks
Strategy has different implementations. Its straightforward when all alternate Concrete Strategies require same type signature. But when concrete implementations start asking for different data from Context, we have to gracefully take a step back by relaxing encapsulation ("breaking encapsulation" is known drawback of strategy), either we can pass Context to strategies in method signature or constructor depending upon how much is needed.
By using interfaces and breaking big object trees in to smaller containments we can restrict the access to most of the Context state.
following code demonstrates passing through method parameter.
public class Context {
private String name;
private int id;
private double salary;
Strategy strategy;
void contextInterface(){
strategy.algorithmInterface(this);
}
public String getName() {
return name;
}
public int getId() {
return id;
}
public double getSalary() {
return salary;
}
}
public interface Strategy {
// WE CAN NOT DECIDE COMMON SIGNATURE HERE
// AS ALL IMPLEMENTATIONS REQUIRE DIFF PARAMS
void algorithmInterface(Context context);
}
public class StrategyA implements Strategy{
#Override
public void algorithmInterface(Context context) {
// OBSERVE HERE BREAKING OF ENCAPSULATION
// BY OPERATING ON SOMEBODY ELSE'S DATA
context.getName();
context.getId();
}
}
public class StrategyB implements Strategy{
#Override
public void algorithmInterface(Context context) {
// OBSERVE HERE BREAKING OF ENCAPSULATION
// BY OPERATING ON SOMEBODY ELSE'S DATA
context.getSalary();
context.getId();
}
}
Okay, I might be going in the wrong direction... but it seems kinda weird that you're passing in arguments to all the algorithms, and the identifier to which algorithm to actually use. Shouldn't the Round() function ideally just get what it needs to operate?
I'm imagining the function that invokes Round() to look something like:
if (something)
algToUse = Enum.Algorithm1;
else
if (otherthing)
algToUse = Enum.Algorithm2;
else
algToUse = Enum.Algorithm3;
Round(investors, remainder, algToUse, dayOfMonth, lunarCycle, numberOfGoblinsFound, etc);
... what if, instead, you did something like this:
public abstract class RoundingAlgorithm
{
public abstract void PerformRounding(IList<Investors> investors, int remainders);
}
public class RoundingRandomly : RoundingAlgorithm
{
private int someNum;
private DateTime anotherParam;
public RoundingRandomly(int someNum, DateTime anotherParam)
{
this.someNum = someNum;
this.anotherParam = anotherParam;
}
public override void PerformRounding(IList<Investors> investors, int remainder)
{
// ... code ...
}
}
// ... and other subclasses of RoundingAlgorithm
// ... later on:
public void Round(IList<Investors> investors, RoundingAlgorithm roundingMethodToUse)
{
// ...your other code (checks, etc)...
roundingMethodToUse.Round(investors, remainders);
}
... and then your earlier function simply looks like:
RoundingAlgorithm roundingMethod;
if (something)
roundingMethod = new RoundingByStreetNum(1, "asdf", DateTime.Now);
else
if (otherthing)
roundingMethod = new RoundingWithPrejudice(null);
else
roundingMethod = new RoundingDefault(1000);
Round(investors, roundingMethod);
... basically, instead of populating that Enum value, just create a RoundingAlgorithm object and pass that in to Round() instead.
We have a Web API library, that calls into a Business/Service library(where our business logic is located), which in turn calls a Data access library (Repository).
We use this type of data transfer object all over the place. It has a "Payers" property that we may have to filter (meaning, manipulate its value). I have gone about implementing that check as such, but it feels dirty to me, as I'm calling the same function all over the place. I have thought about either:
Using an attribute filter to handle this or
Making the RequestData a property on the class, and do the filtering in the constructor.
Any additional thoughts or design patterns where this could be designed more efficiently:
public class Example
{
private MyRepository _repo = new MyRepository();
private void FilterRequestData(RequestData data)
{
//will call into another class that may or may not alter RequestData.Payers
}
public List<ReturnData> GetMyDataExample1(RequestData data)
{
FilterRequestData(RequestData data);
return _repo.GetMyDataExample1(data);
}
public List<ReturnData> GetMyDataExample2(RequestData data)
{
FilterRequestData(RequestData data);
return _repo.GetMyDataExample2(data);
}
public List<ReturnData> GetMyDataExample3(RequestData data)
{
FilterRequestData(RequestData data);
return _repo.GetMyDataExample3(data);
}
}
public class RequestData
{
List<string> Payers {get;set;}
}
One way of dealing with repeated code like that is to use a strategy pattern with a Func (and potentially some generics depending on your specific case). You could refactor that into separate classes and everything but the basic idea looks like that:
public class MyRepository
{
internal List<ReturnData> GetMyDataExample1(RequestData arg) { return new List<ReturnData>(); }
internal List<ReturnData> GetMyDataExample2(RequestData arg) { return new List<ReturnData>(); }
internal List<ReturnData> GetMyDataExample3(RequestData arg) { return new List<ReturnData>(); }
}
public class ReturnData { }
public class Example
{
private MyRepository _repo = new MyRepository();
private List<ReturnData> FilterRequestDataAndExecute(RequestData data, Func<RequestData, List<ReturnData>> action)
{
// call into another class that may or may not alter RequestData.Payers
// and then execute the actual code, potentially with some standardized exception management around it
// or logging or anything else really that would otherwise be repeated
return action(data);
}
public List<ReturnData> GetMyDataExample1(RequestData data)
{
// call the shared filtering/logging/exception mgmt/whatever code and pass some additional code to execute
return FilterRequestDataAndExecute(data, _repo.GetMyDataExample1);
}
public List<ReturnData> GetMyDataExample2(RequestData data)
{
// call the shared filtering/logging/exception mgmt/whatever code and pass some additional code to execute
return FilterRequestDataAndExecute(data, _repo.GetMyDataExample2);
}
public List<ReturnData> GetMyDataExample3(RequestData data)
{
// call the shared filtering/logging/exception mgmt/whatever code and pass some additional code to execute
return FilterRequestDataAndExecute(data, _repo.GetMyDataExample3);
}
}
public class RequestData
{
List<string> Payers { get; set; }
}
This sort of thinking naturally leads to aspect oriented programming.
It's specifically designed to handle cross-cutting concerns (e.g. here, your filter function cuts across your query logic.)
As #dnickless suggests, you can do this in an ad-hoc way by refactoring your calls to remove the duplicated code.
More general solutions exist, such as PostSharp which give you a slightly cleaner way of structuring code along aspects. It is proprietary, but I believe the free tier gives enough to investigate an example like this. At the very least it's interesting to see how it would look in PostSharp, and whether you think it improves it at all! (It makes strong use of attributes, which extends first suggestion.)
(N.B. I'm not practically suggesting installing another library for a simple case like this, but highlighting how these types of problems might be examined in general.)
I don't usually code C#, when i do, i suck
I have parent Class and two derived class. both derived class share an expensive calculation which slightly differ for second one. I am trying to avoid calculate one.
However, i want
interface ICalculator
{
double getValue(int id);
void setContext(int c);
}
abstract class CalculatorBase: ICalculator
{
internal static Dictionary<int, double> output = null;
internal void loadData()
{
//load data
}
internal computeAll()
{
//do expenseive calculation and set output
output = something
}
double abstract getValue(int id);
void abstract setContext(int c);
}
class ChildCalculator1 : CalculatorBase
{
override void setContext(int c)
{
if (output !=null)
return;
loadData();
computeAll();
}
public ovverride getValue(int id)
{
return output[id];
}
}
class ChildCalculator2 : CalculatorBase
{
override void setContext(int c)
{
if (output !=null)
return;
loadData();
computeAll();
}
public ovverride getValue(int id)
{
return output[id] -1;
}
}
requirements:
if ChildCalculator1 or ChildCalculator or both (one after another) is called, computeAll will be computed once.
However, if you reload this page i want to calculate once. This means i want to calculate once every pageload
Question: How can i access parent properties (output) from two different child instance (ChildCalculator1, ChildCalculator) and if you reload the page, that proproperty (output) will be recalculated? Currently I made output as static but this doesnt change when i reload the page.
Static variable might not be the right thing as they survive through out the application not page load. How can i dispose after pageload is done or anything else you can suggest?
Your code isn't so bad... but it could definitely be better. :)
You are correct that the static dictionary will not get garbage collected. (In C# the Garbage Collector free's unused memory) You need all instances of calculator to share your dictionary and you want to dispose of it when you are done. You could implement a little factory that builds the calculators and gives them all a single instance of the dictionary. A very simple way to do this however is just to manually manage the static dictionary.
If you add the following method in CalculatorBase
public static void DoneWithCalculations()
{
// By removing your static reference to your dictionary you
// allow the GC to free the memory.
output = null;
}
You can then call this static method when you are all done with your calculators (for instance at the end of PageLoad) like so...
CalculatorBase.DoneWithCalculations();
This will do what you need and doesn't force you to work in C# more than you have to. :)
I have a generic factory which caches an instance before return it (simplified code):
static class Factory<T>
where T : class
{
private static T instance;
public static T GetInstance()
{
if (instance == null) instance = new T();
return instance;
}
}
I want to replace such approach with non-caching one to show that caching makes no sense in matters of instantiation performance (I believe new object creation is very cheap).
So I want to write a load test which will create a deal, say 1000, of dynamic, runtime-only types and load it to my factories. One will cache, and another - will not.
Here's my two cents although I agree with jgauffin's and Daniel Hilgarth's answers. Using generic type caching using a static member in this way would intuitively create additional parallel types per type that is cached but it is important to understand how this works differently for reference and value types. For reference types as T the additional generic types produced should use less resources than would an equivalent usage of a value type.
So when should you use the generic type technique for producing a cache? Below are a few important criteria that I use.
1. You want to allow caching single instances of each class of interest.
2. You would like to use compile time generic type constraints to enforce rules on the types used in the cache. With type constraints you can enforce the need for an instance to implement several interfaces without having to define a base type for those classes.
3. You don't need to remove items from the cache for the lifetime of the AppDomain.
By the way one term that may be useful to search on is "Code Explosion" which is a general term used to define cases where a considerable amount of code is needed to perform some regularly occurring task and that generally grows linearly or worse with the growth of project requirements. In terms of generic types, I've heard and will generally use the term "type explosion" to describe the proliferation of types as you begin to combine and compose several generic types.
Another important point is that in these cases a factory and the cache can always be separated and in most cases they can be given an identical interface which would allow you to substitute the factory (new instance per call) or the cache which essentially wraps the factory and delegates through the same interface in cases where you want to use one or the other depending on things such as type explosion concerns. Your cache could also take on more responsibility such as a more sophisticated caching strategy where perhaps particular types are cached differently (ex. reference types vs. value types). If your curious about this the trick is to define your generic class that does the caching as a private class within the actual concrete type that implements the interface for your factory. I can give an example if you would like.
Update with example code as requested:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace CacheAndFactory
{
class Program
{
private static int _iterations = 1000;
static void Main(string[] args)
{
var factory = new ServiceFactory();
// Exercise the factory which implements IServiceSource
AccessAbcTwoTimesEach(factory);
// Exercise the generics cache which also implements IServiceSource
var cache1 = new GenericTypeServiceCache(factory);
AccessAbcTwoTimesEach(cache1);
// Exercise the collection based cache which also implements IServiceSource
var cache2 = new CollectionBasedServiceCache(factory);
AccessAbcTwoTimesEach(cache2);
Console.WriteLine("Press any key to continue");
Console.ReadKey();
}
public static void AccessAbcTwoTimesEach(IServiceSource source)
{
Console.WriteLine("Excercise " + source.GetType().Name);
Console.WriteLine("1st pass - Get an instance of A, B, and C through the source and access the DoSomething for each.");
source.GetService<A>().DoSomething();
source.GetService<B>().DoSomething();
source.GetService<C>().DoSomething();
Console.WriteLine();
Console.WriteLine("2nd pass - Get an instance of A, B, and C through the source and access the DoSomething for each.");
source.GetService<A>().DoSomething();
source.GetService<B>().DoSomething();
source.GetService<C>().DoSomething();
Console.WriteLine();
var clock = Stopwatch.StartNew();
for (int i = 0; i < _iterations; i++)
{
source.GetService<A>();
source.GetService<B>();
source.GetService<C>();
}
clock.Stop();
Console.WriteLine("Accessed A, B, and C " + _iterations + " times each in " + clock.ElapsedMilliseconds + "ms through " + source.GetType().Name + ".");
Console.WriteLine();
Console.WriteLine();
}
}
public interface IService
{
}
class A : IService
{
public void DoSomething() { Console.WriteLine("A.DoSomething(), HashCode: " + this.GetHashCode()); }
}
class B : IService
{
public void DoSomething() { Console.WriteLine("B.DoSomething(), HashCode: " + this.GetHashCode()); }
}
class C : IService
{
public void DoSomething() { Console.WriteLine("C.DoSomething(), HashCode: " + this.GetHashCode()); }
}
public interface IServiceSource
{
T GetService<T>()
where T : IService, new();
}
public class ServiceFactory : IServiceSource
{
public T GetService<T>()
where T : IService, new()
{
// I'm using Activator here just as an example
return Activator.CreateInstance<T>();
}
}
public class GenericTypeServiceCache : IServiceSource
{
IServiceSource _source;
public GenericTypeServiceCache(IServiceSource source)
{
_source = source;
}
public T GetService<T>()
where T : IService, new()
{
var serviceInstance = GenericCache<T>.Instance;
if (serviceInstance == null)
{
serviceInstance = _source.GetService<T>();
GenericCache<T>.Instance = serviceInstance;
}
return serviceInstance;
}
// NOTE: This technique will cause all service instances cached here
// to be shared amongst all instances of GenericTypeServiceCache which
// may not be desireable in all applications while in others it may
// be a performance enhancement.
private class GenericCache<T>
{
public static T Instance;
}
}
public class CollectionBasedServiceCache : IServiceSource
{
private Dictionary<Type, IService> _serviceDictionary;
IServiceSource _source;
public CollectionBasedServiceCache(IServiceSource source)
{
_serviceDictionary = new Dictionary<Type, IService>();
_source = source;
}
public T GetService<T>()
where T : IService, new()
{
IService serviceInstance;
if (!_serviceDictionary.TryGetValue(typeof(T), out serviceInstance))
{
serviceInstance = _source.GetService<T>();
_serviceDictionary.Add(typeof(T), serviceInstance);
}
return (T)serviceInstance;
}
private class GenericCache<T>
{
public static T Instance;
}
}
}
Basically to summarize, the code above is a console app that has the concept of an interface to provide for an abstraction of a service source. I used an IService generic constraint just to show an example of how it could matter. I don't want to type or post 1000 separate type definitions so I did the next best thing and created three classes - A, B, and C - and accessed them each 1000 times using each technique - repetitive instantiation, generic type cache, and collection based cache.
With a small set of accesses the difference is negligible but of course my service constructor is simplistic (default parameterless constructor) so it does not calculate anything, access a database, access configuration or any of the things that typical service classes do when they are constructed. If this were not the case then the benefits of some caching strategy is obviously going to be beneficial for performance. Also when accessing even the default constructor in the caes where there are 1,000,000 accesses there is still a dramatic difference between not caching and caching (3s : 120ms) so the lesson is that if you are doing high volume accesses or complex calculations that require frequent access through the factory then caching will be not only beneficial but verging on a necessity depending on whether it impacts user perception or time sensitive business processes otherwise the benefits are negligible. The important thing to remember is that it's not just instantiation time that you have to worry about but also the load on the Garbage collector.
Sounds to me that your colleague want's to do premature optimizations. Caching objects are seldom a good idea. Instantiation is cheap and I would only cache objects where it's proven that it will be faster. A high performance socket server would be such case.
But to answer your question: Caching objects will always be faster. Keeping them in a LinkedList or something like that will keep the overhead small and performance should not decrease as the number of objects grow.
So if you are willing to accept larger memory consumption and increased complexity, go for a cache.
I'm not sure exactly how to describe this question, but here goes. I've got a class hierarchy of objects that are mapped in a SQLite database. I've already got all the non-trivial code written that communicates between the .NET objects and the database.
I've got a base interface as follows:
public interface IBackendObject
{
void Read(int id);
void Refresh();
void Save();
void Delete();
}
This is the basic CRUD operations on any object. I've then implemented a base class that encapsulates much of the functionality.
public abstract class ABackendObject : IBackendObject
{
protected ABackendObject() { } // constructor used to instantiate new objects
protected ABackendObject(int id) { Read(id); } // constructor used to load object
public void Read(int id) { ... } // implemented here is the DB code
}
Now, finally, I have my concrete child objects, each of which have their own tables in the database:
public class ChildObject : ABackendObject
{
public ChildObject() : base() { }
public ChildObject(int id) : base(id) { }
}
This works fine for all my purposes so far. The child has several callback methods that are used by the base class to instantiate the data properly.
I now want to make this slightly efficient. For example, in the following code:
public void SomeFunction1()
{
ChildObject obj = new ChildObject(1);
obj.Property1 = "blah!";
obj.Save();
}
public void SomeFunction2()
{
ChildObject obj = new ChildObject(1);
obj.Property2 = "blah!";
obj.Save();
}
In this case, I'll be constructing two completely new memory instantiations and depending on the order of SomeFunction1 and SomeFunction2 being called, either Property1 or Property2 may not be saved. What I want to achieve is a way for both these instantiations to somehow point to the same memory location--I don't think that will be possible if I'm using the "new" keyword, so I was looking for hints as to how to proceed.
Ideally, I'd want to store a cache of all loaded objects in my ABackendObject class and return memory references to the already loaded objects when requested, or load the object from memory if it doesn't already exist and add it to the cache. I've got a lot of code that is already using this framework, so I'm of course going to have to change a lot of stuff to get this working, but I just wanted some tips as to how to proceed.
Thanks!
If you want to store a "cache" of loaded objects, you could easily just have each type maintain a Dictionary<int, IBackendObject> which holds loaded objects, keyed by their ID.
Instead of using a constructor, build a factory method that checks the cache:
public abstract class ABackendObject<T> where T : class
{
public T LoadFromDB(int id) {
T obj = this.CheckCache(id);
if (obj == null)
{
obj = this.Read(id); // Load the object
this.SaveToCache(id, obj);
}
return obj;
}
}
If you make your base class generic, and Read virtual, you should be able to provide most of this functionality without much code duplication.
What you want is an object factory. Make the ChildObject constructor private, then write a static method ChildObject.Create(int index) which returns a ChildObject, but which internally ensures that different calls with the same index return the same object. For simple cases, a simple static hash of index => object will be sufficient.
If you're using .NET Framework 4, you may want to have a look at the System.Runtime.Caching namespace, which gives you a pretty powerful cache architecture.
http://msdn.microsoft.com/en-us/library/system.runtime.caching.aspx
Sounds perfect for a reference count like this...
#region Begin/End Update
int refcount = 0;
ChildObject record;
protected ChildObject ActiveRecord
{
get
{
return record;
}
set
{
record = value;
}
}
public void BeginUpdate()
{
if (count == 0)
{
ActiveRecord = new ChildObject(1);
}
Interlocked.Increment(ref refcount);
}
public void EndUpdate()
{
int count = Interlocked.Decrement(ref refcount);
if (count == 0)
{
ActiveRecord.Save();
}
}
#endregion
#region operations
public void SomeFunction1()
{
BeginUpdate();
try
{
ActiveRecord.Property1 = "blah!";
}
finally
{
EndUpdate();
}
}
public void SomeFunction2()
{
BeginUpdate();
try
{
ActiveRecord.Property2 = "blah!";
}
finally
{
EndUpdate();
}
}
public void SomeFunction2()
{
BeginUpdate();
try
{
SomeFunction1();
SomeFunction2();
}
finally
{
EndUpdate();
}
}
#endregion
I think your on the right track more or less. You can either create a factory which creates your child objects (and can track "live" instances), or you can keep track of instances which have been saved, so that when you call your Save method it recognizes that your first instance of ChildObject is the same as your second instance of ChildObject and does a deep copy of the data from the second instance over to the first. Both of these are fairly non-trivial from a coding standpoint, and both probably involve overriding the equality methods on your entities. I tend to think that using the first approach would be less likely to cause errors.
One additional option would be to use an existing Obect-Relational mapping package like NHibernate or Entity Framework to do your mapping between objects and your database. I know NHibernate supports Sqlite, and in my experience tends to be the one that requires the least amount of change to your entity structures. Going that route you get the benefit of the ORM layer tracking instances for you (and generating SQL for you), plus you would probably get some more advanced features your current data access code may not have. The downside is that these frameworks tend to have a learning curve associated with them, and depending on which you go with there could be a not insignificant impact on the rest of your code. So it would be worth weighing the benefits against the cost of learning the framework and converting your code to use the API.