Publishing to multiple subscribes in RX - c#

I am investigating how to develop a plugin framework for a project and Rx seems like a good fit for what i am trying to achieve. Ultimately, the project will be a set of plugins (modular functionality) that can be configured via xml to do different things. The requirements are as follows
Enforce a modular architecture even within a plugin. This encourages loose coupling and potentially minimizes complexity. This hopefully should make individual plugin functionality easier to model and test
Enforce immutability with respect to data to reduce complexity and ensure that state management within modules is kept to a minimum
Discourage manual thread creation by providing thread pool threads to do work within modules wherever possible
In my mind, a plugin is essentially a data transformation entity. This means a plugin either
Takes in some data and transforms it in some way to produce new data (Not shown here)
Generates data in itself and pushes it out to observers
Takes in some data and does some work on the data without notifying outsiders
If you take the concept further, a plugin can consist of a number of all three types above.For example within a plugin you can have an IntGenerator module that generates some data to a ConsoleWorkUnit module etc. So what I am trying to model in the main function is the wiring that a plugin would have to do its work.
To that end, I have the following base classes using the Immutable nuget from Microsoft. What I am trying to achieve is to abstract away the Rx calls so they can be used in modules so the ultimate aim would be to wrap up calls to buffer etc in abstract classes that can be used to compose complex queries and modules. This way the code is a bit more self documenting than having to actually read all the code within a module to find out it subscribes to a buffer or window of type x etc.
public abstract class OutputBase<TOutput> : SendOutputBase<TOutput>
{
public abstract void Work();
}
public interface IBufferedBase<TOutput>
{
void Work(IList<ImmutableList<Data<TOutput>>> list);
}
public abstract class BufferedWorkBase<TInput> : IBufferedBase<TInput>
{
public abstract void Work(IList<ImmutableList<Data<TInput>>> input);
}
public abstract class SendOutputBase<TOutput>
{
private readonly ReplaySubject<ImmutableList<Data<TOutput>>> _outputNotifier;
private readonly IObservable<ImmutableList<Data<TOutput>>> _observable;
protected SendOutputBase()
{
_outputNotifier = new ReplaySubject<ImmutableList<Data<TOutput>>>(10);
_observable = _outputNotifier.SubscribeOn(ThreadPoolScheduler.Instance);
_observable = _outputNotifier.ObserveOn(ThreadPoolScheduler.Instance);
}
protected void SetOutputTo(ImmutableList<Data<TOutput>> output)
{
_outputNotifier.OnNext(output);
}
public void ConnectOutputTo(IWorkBase<TOutput> unit)
{
_observable.Subscribe(unit.Work);
}
public void BufferOutputTo(int count, IBufferedBase<TOutput> unit)
{
_observable.Buffer(count).Subscribe(unit.Work);
}
}
public abstract class WorkBase<TInput> : IWorkBase<TInput>
{
public abstract void Work(ImmutableList<Data<TInput>> input);
}
public interface IWorkBase<TInput>
{
void Work(ImmutableList<Data<TInput>> input);
}
public class Data<T>
{
private readonly T _value;
private Data(T value)
{
_value = value;
}
public static Data<TData> Create<TData>(TData value)
{
return new Data<TData>(value);
}
public T Value { get { return _value; } }
}
These base classes are used to create three classes; one for generating some int data, one to print out the data when they occur and the last to buffer the data as it comes in and sum the values in threes.
public class IntGenerator : OutputBase<int>
{
public override void Work()
{
var list = ImmutableList<Data<int>>.Empty;
var builder = list.ToBuilder();
for (var i = 0; i < 1000; i++)
{
builder.Add(Data<int>.Create(i));
}
SetOutputTo(builder.ToImmutable());
}
}
public class ConsoleWorkUnit : WorkBase<int>
{
public override void Work(ImmutableList<Data<int>> input)
{
foreach (var data in input)
{
Console.WriteLine("ConsoleWorkUnit printing {0}", data.Value);
}
}
}
public class SumPrinter : WorkBase<int>
{
public override void Work(ImmutableList<Data<int>> input)
{
input.ToObservable().Buffer(2).Subscribe(PrintSum);
}
private void PrintSum(IList<Data<int>> obj)
{
Console.WriteLine("Sum of {0}, {1} is {2} ", obj.First().Value,obj.Last().Value ,obj.Sum(x=>x.Value) );
}
}
These are run in a main like this
var intgen = new IntGenerator();
var cons = new ConsoleWorkUnit();
var sumPrinter = new SumPrinter();
intgen.ConnectOutputTo(cons);
intgen.BufferOutputTo(3,sumPrinter);
Task.Factory.StartNew(intgen.Work);
Console.ReadLine();
Is this architecture sound?

You are buffering your observable (.Buffer(count)) so that it only signals after count notifications arrive.
However, your IntGenerator.DoWork only ever produces a single value. Thus you never "fill" the buffer and trigger downstream notifications.
Either change DoWork so that it eventually produces more values, or have it complete the observable stream when it finishes its work. Buffer will release the remaining buffered values when the stream completes. To do this, it means somewhere IntGenerator.DoWork needs to cause a call to _outputNotifier.OnCompleted()

Related

Is there a design pattern for one input passed to n methods that each return the input for the next method

I'd like to know if there is a design pattern for this problem:
One input is used to construct an object (via constructor or via method return I don't care), that object is feed to the next method or constructor. This is repeated on a user specified set of processors, obviously throwing exceptions if there is a break in the chain of required inputs for the processors.
The output of all, some or none of the implemented processors is the same object.
I've got about 6 processors planned, possibly more in future.
Composition:
I'm not sure I like the composition design pattern because not every object is intended to be an output of this process and I can't think how to not output null values without the user knowing it's going to be null.
Chain of responsibility:
Chain of responsibility is the way to go according to what I've heard however I'm not sure i understand it. Is this design pattern suggesting to pass n function pointers to a function that runs through each? If so, I'm not sure how to setup the function that gets passed n function pointers.
my attempt:
I've got two interface that are inherited by n classes ie (FirstProcessor, FirstInput, FirstOutput, SecondProcessor, SecondOutput, ThirdProcessor,.., NProcessor, NOutput)
IChainedOutput
{
IChainedOutput Input {get;}
FinalOutputOBj GetFinalOutput()
}
IChainedProcessor
{
IChainedOutput Run(IChainedOutput previousOutput)
}
used like this:
IChainedProcessor previous = FirstProcessor(originalInput)
foreach(IChainedProcessor processor in processorList.Skip(1)
{
IChainedOutput current = processor.Run(previous)
previous = current;
}
FinalOutputObj output = previous.GetFinalOutput();
Problems:
FinalOutputObj is coupled with all the processor implementations which is bad. It's not comprised of all the IChainedOutput child class members but uses a good subset to calculate other values.
FinalOutputObj is being composed in a bad way and I don't see how I can escape from outputting null values if the list of processors does not contain every processor implemented.
There is a lot of downcasting in the constructors of the child classes which is a red flag for oop. However, the inputs for each block of processing logic are completely different. First input is a couple of vectors, second input is the output of the first which includes a handful of custom types and more vectors, etc
Each IChainedOutput contains the reference to the inputs used to create it. currently there is one to one mapping input to processor but i'm not sure in future. And this is more bad downcasting.
I'd like to not have to perfectly organise the list of processors, makes it too easy for other developers to make mistakes here. so the next one selected should be the one that has the correct constructor.
You could try a decorator approach like this:
public interface IChainProcessor
{
IChainOutput Run(IChainOutput previousOutput);
}
public interface IChainOutput
{
string Value { get; }
}
public class OutputExample : IChainOutput
{
public string Value { get; }
public OutputExample(string value)
{
this.Value = value;
}
}
public abstract class Processor : IChainProcessor
{
protected IChainProcessor nextProcessor;
public IChainOutput Run(IChainOutput previousOutput)
{
var myOutput = this.MyLogic(previousOutput);
return this.nextProcessor == null ? myOutput : this.nextProcessor.Run(myOutput);
}
protected abstract IChainOutput MyLogic(IChainOutput input);
}
public class ProcessorA : Processor
{
public ProcessorA() { }
public ProcessorA(ProcessorB nextProcessor)
{
this.nextProcessor = nextProcessor;
}
protected override IChainOutput MyLogic(IChainOutput input)
{
return new OutputExample($"{input.Value} + Processor_A_Output");
}
}
public class ProcessorB : ProcessorA
{
public ProcessorB() { }
public ProcessorB(ProcessorC nextProcessor)
{
this.nextProcessor = nextProcessor;
}
protected override IChainOutput MyLogic(IChainOutput input)
{
return new OutputExample($"{input.Value} + Processor_B_Output");
}
}
public class ProcessorC : ProcessorB
{
protected override IChainOutput MyLogic(IChainOutput input)
{
return new OutputExample($"{input.Value} + Processor_C_Output");
}
}
The usage would be something like the below:
private static int Main(string[] args)
{
var chain = new ProcessorA(new ProcessorB(new ProcessorC()));
var simpleChain = new ProcessorA(new ProcessorC());
var verySimpleChain = new ProcessorA();
var initialInput = new OutputExample("Start");
Console.WriteLine(chain.Run(initialInput).Value);
Console.WriteLine(simpleChain.Run(initialInput).Value);
Console.WriteLine(verySimpleChain.Run(initialInput).Value);
return 0;
}
The output of this example is:
Start + Processor_A_Output + Processor_B_Output + Processor_C_Output
Start + Processor_A_Output + Processor_C_Output
Start + Processor_A_Output
The abstract Processor class provides a template method that you can implement in subclasses. So every ProcessorX class only defines MyLogic(IChainOutput input)
The Processors extend each other to enforce compile time preservation of processor order. So it is impossible to build a chain where ProcessorB comes before ProcessorA. It is possible though to build a chain that omits some processors as in the above example.
The example I provide here does not cater for the final output, which I know is one of your main concerns. To deal with issue I would rather build a mapping class to convert IChainOutput into the final format (I don't know the real structure of your data so maybe this is not possible).
in some of my cases it would make sense to have the output of one processor be the input for multiple other processors
Using this pattern it would also be possible to construct a processor 'tree' rather than a chain, by allowing the Processor class to have a list of next steps. Your usage would then become something like this:
var chain = new ProcessorA(new ProcessorB(new ProcessorC()), new ProcessorB(new ProcessorD()));
I hope this can help you.
If I understood your explanation correctly you can use delegates to overcome your problem. One of the important point about delegates is that they can be chained together so that you can call any number of methods in a single event.
Each processor transforming specific input into a specific output. Therefore the processor implementation should know only two types.
public interface IStepProcessor<TInput, TOutput>
{
TOutput Process(TInput input);
}
The client code ideally should know only two type of data that is input data and final product. The client code don't care if there were some intermediary steps in the middle. Client make use the conveyor as a black box
public delegate TOutput Conveyor<TInput, TOutput>(TInput input);
Yet some external code should understand how the whole transformation is done. This code should know all the intermediate data types and have access to all intermediate processors. It is done best with dependency injection.
public class Factory
{
private readonly IStepProcessor<IDataInput, IDataStep1> m_Step1;
private readonly IStepProcessor<IDataStep1, IDataStep2> m_Task2;
private readonly IStepProcessor<IDataStep2, IDataStep3> m_Task3;
private readonly IStepProcessor<IDataStep3, IDataStepN> m_TaskN;
private readonly IStepProcessor<IDataStepN, IDataOutput> m_FinalTask;
public Factory(
IStepProcessor<IDataInput, IDataStep1> task1,
IStepProcessor<IDataStep1, IDataStep2> task2,
IStepProcessor<IDataStep2, IDataStep3> task3,
IStepProcessor<IDataStep3, IDataStepN> taskN,
IStepProcessor<IDataStepN, IDataOutput> finalTask
)
{
m_Step1 = task1;
m_Task2 = task2;
m_Task3 = task3;
m_TaskN = taskN;
m_FinalTask = finalTask;
}
public Conveyor<IDataInput, IDataOutput> BuildConveyor()
{
return (input) =>
{
return m_FinalTask.Process(
m_TaskN.Process(
m_Task3.Process(
m_Task2.Process(
m_Step1.Process(input)))));
};
}
}
Here goes my offer
public interface IDataInput { }
public interface IDataStep1 { }
public interface IDataStep2 { }
public interface IDataStep3 { }
public interface IDataStepN { }
public interface IDataOutput { }
public interface IStepProcessor<TInput, TOutput>
{
TOutput Process(TInput input);
}
public delegate TOutput Conveyor<TInput, TOutput>(TInput input);
public class Factory
{
private readonly IStepProcessor<IDataInput, IDataStep1> m_Step1;
private readonly IStepProcessor<IDataStep1, IDataStep2> m_Task2;
private readonly IStepProcessor<IDataStep2, IDataStep3> m_Task3;
private readonly IStepProcessor<IDataStep3, IDataStepN> m_TaskN;
private readonly IStepProcessor<IDataStepN, IDataOutput> m_FinalTask;
public Factory(
IStepProcessor<IDataInput, IDataStep1> task1,
IStepProcessor<IDataStep1, IDataStep2> task2,
IStepProcessor<IDataStep2, IDataStep3> task3,
IStepProcessor<IDataStep3, IDataStepN> taskN,
IStepProcessor<IDataStepN, IDataOutput> finalTask
)
{
m_Step1 = task1;
m_Task2 = task2;
m_Task3 = task3;
m_TaskN = taskN;
m_FinalTask = finalTask;
}
public Conveyor<IDataInput, IDataOutput> BuildConveyor()
{
return (input) =>
{
return m_FinalTask.Process(
m_TaskN.Process(
m_Task3.Process(
m_Task2.Process(
m_Step1.Process(input)))));
};
}
}
public class Client
{
private readonly Conveyor<IDataInput, IDataOutput> m_Conveyor;
public Client(Conveyor<IDataInput, IDataOutput> conveyor)
{
m_Conveyor = conveyor;
}
public void DealWithInputAfterTransformingIt(IDataInput input)
{
var output = m_Conveyor(input);
Console.Write($"Mind your business here {typeof(IDataOutput).IsAssignableFrom(output.GetType())}");
}
}
public class Program {
public void StartingPoint(IServiceProvider serviceProvider)
{
ISomeDIContainer container = CreateDI();
container.Register<IStepProcessor<IDataInput, IDataStep1>, Step1Imp>();
container.Register<IStepProcessor<IDataStep1, IDataStep2>, Step2Imp>();
container.Register<IStepProcessor<IDataStep2, IDataStep3>, Step3Imp>();
container.Register<IStepProcessor<IDataStep3, IDataStepN>, StepNImp>();
container.Register<IStepProcessor<IDataStepN, IDataOutput>, StepOImp>();
container.Register<Factory>();
Factory factory = container.Resolve<Factory>();
var conveyor = factory.BuildConveyor();
var client = new Client(conveyor);
}
}

General design guidance c#; finding im unnecessarily passing objects between methods

Sorry its a bit vague perhaps but its been bugging me for weeks. I find each project I tackle I end up making what I think is a design mistake and am pretty sure theres a bettwe way.
When defining a class thats serialized from an event source like a sinple json doc definition. Lets call it keys class with various defined integers, bools and strings. i have multiple methods that make use of this and i find that i constantly need to paas this class as an object by means of an overload. So method a calls methods b, method b doesnt need these objects but it calls method c which does... In doing this bad practice im passing these 'keys' objects to method b for the sole purpose of method c accessibility.
Im probably missing one major OOP fundamental :) any guidance or reading would be appreciated as im googled out!!
public class Keys
{
public child Detail { get; set; }
}
public class child
{
public string instance { get; set; }
}
//my main entry point
public void FunctionHandler(Keys input, ILambdaContext context)
{
methodA(input)
}
static void methodA(Keys input)
{
//some-other logic or test that doesn't need Keys object/class if (foo==bar) {proceed=true;}
string foo = methodB(input)
}
static string methodB(Keys input)
{
//here i need Keys do do stuff and I return a string in this example
}
What you do is not necessarily bad or wrong. Remember that in C# what you actually pass are references, not objects proper, so the overhead of parameter passing is really small.
The main downside of long call chains is that the program logic is perhaps more complicated than it needs to be, with the usual maintainability issues.
Sometimes you can use the C# type system to let the compiler or the run time choose the proper function.
The compiler is employed when you overload method() for two different types instead of defining methodA() and methodB(). But they are distinguished by the parameter type, so you need different Key types which may be (but don't have to be) related:
public class KeyA {/*...*/}
public class KeyB {/*...*/}
void method(KeyA kA) { /* do something with kA */ }
void method(KeyB kB) { /* do something with kB */ }
This is of limited benefit; that the functions have the same name is just syntactic sugar which makes it clear that they serve the same purpose.
The other, perhaps more elegant and versatile technique is to create an inheritance hierarchy of Keys which each "know" what a method should do.
You'll need a base class with a virtual method which will be overridden by the inheriting classes. Often the base is an interface just declaring that there is some method(), and the various implementing types implement a method() which suits them. Here is a somewhat lengthy example which uses a virtual Output() method so that we see something on the Console.
It's noteworthy that each Key calls a method of an OutputterI, passing itself to it as a parameter; the outputter class then in turn calls back a method of the calling object. That's called "Double Dispatch" and combines run-time polymorphism with compile-time function overloading. At compile time the object and it's concrete type are not known; in fact, they can be implemented later (e.g. by inventing another Key). But each object knows what to do when its callback function (here: GetData()) is called.
using System;
using System.Collections.Generic;
namespace DoubleDispatch
{
interface KeyI
{ // They actually delegate that to an outputter
void Output();
}
interface OutputterI
{
void Output(KeyA kA);
void Output(KeyExtra kE);
void Output(KeyI k); // whatever this does.
}
class KeyBase: KeyI
{
protected OutputterI o;
public KeyBase(OutputterI oArg) { o = oArg; }
// This will call Output(KeyI))
public virtual void Output() { o.Output(this); }
}
class KeyA : KeyBase
{
public KeyA(OutputterI oArg) : base(oArg) { }
public string GetAData() { return "KeyA Data"; }
// This will compile to call Output(KeyA kA) because
// we pass this which is known here to be of type KeyA
public override void Output() { o.Output(this); }
}
class KeyExtra : KeyBase
{
public string GetEData() { return "KeyB Data"; }
public KeyExtra(OutputterI oArg) : base(oArg) { }
/** Some extra data which needs to be handled during output. */
public string GetExtraInfo() { return "KeyB Extra Data"; }
// This will, as is desired,
// compile to call o.Output(KeyExtra)
public override void Output() { o.Output(this); }
}
class KeyConsolePrinter : OutputterI
{
// Note: No way to print KeyBase.
public void Output(KeyA kA) { Console.WriteLine(kA.GetAData()); }
public void Output(KeyExtra kE)
{
Console.Write(kE.GetEData() + ", ");
Console.WriteLine(kE.GetExtraInfo());
}
// default method for other KeyI
public void Output(KeyI otherKey) { Console.WriteLine("Got an unknown key type"); }
}
// similar for class KeyScreenDisplayer{...} etc.
class DoubleDispatch
{
static void Main(string[] args)
{
KeyConsolePrinter kp = new KeyConsolePrinter();
KeyBase b = new KeyBase(kp);
KeyBase a = new KeyA(kp);
KeyBase e = new KeyExtra(kp);
// Uninteresting, direkt case: We know at compile time
// what each object is and could simply call kp.Output(a) etc.
Console.Write("base:\t\t");
b.Output();
Console.Write("KeyA:\t\t");
a.Output();
Console.Write("KeyExtra:\t");
e.Output();
List<KeyI> list = new List<KeyI>() { b, a, e };
Console.WriteLine("\nb,a,e through KeyI:");
// Interesting case: We would normally not know which
// type each element in the vector has. But each type's specific
// Output() method is called -- and we know it must have
// one because that's part of the interface signature.
// Inside each type's Output() method in turn, the correct
// OutputterI::Output() for the given real type was
// chosen at compile time dpending on the type of the respective
// "this"" argument.
foreach (var k in list) { k.Output(); }
}
}
}
Sample output:
base: Got an unknown key type
KeyA: KeyA Data
KeyExtra: KeyB Data, KeyB Extra Data
b,a,e through KeyI:
Got an unknown key type
KeyA Data
KeyB Data, KeyB Extra Data

C# Singleton Pattern over Inherited Classes

I'll begin this question with apologizing for the length of the post. So that I save you some time, my problem is that the class pattern I've got stuck in my head is obviously flawed, and I can't see a good solution.
In a project I'm working on, I need to use operate algorithms on a chunks of data, let's call them DataCache. Sometimes these algorithms return results that themselves need to be cached, and so I devised a scheme.
I have an Algorithm base class that looks like so
abstract class Algorithm<T>
{
protected abstract T ExecuteAlgorithmLogic(DataCache dataCache);
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
public T ComputeResult(DataCache dataCache, bool save = false)
{
if (_resultsStrong.ContainsKey(dataCache))
return _resultsStrong[dataCache];
if (_resultsWeak.ContainsKey(dataCache))
{
var temp = _resultsWeak[dataCache].Target;
if (temp != null) return (T) temp;
}
var result = ExecuteAlgorithmLogic(dataCache);
_resultsWeak[dataCache] = new WeakReference(result, true);
if (save) _resultsStrong[dataCache] = result;
return result;
}
}
If you call ComputeResult() and provide a DataCache you can optionally select to cache the result. Also, if you are lucky result still might be there if the GC hasn't collected it yet. The size of each DataCache is in hundreds of megabytes, and before you ask there are about 10 arrays in each, which hold basic types such as int and float.
My idea here was that an actual algorithm would look something like this:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//Elves be here
}
}
And I would define tens of .cs files, each for one algorithm. There are two problems with this approach. Firstly, in order for this to work, I need to instantiate my algorithms and keep that instance (or the results are not cached and the entire point is mute). But then I end up with an unsightly singleton pattern implementation in each derived class. It would look something like so:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//Elves and dragons be here
}
protected ActualAgorithm(){ }
private static ActualAgorithm _instance;
public static ActualAgorithm Instance
{
get
{
_instance = _instance ?? new ActualAgorithm();
return _instance;
}
}
}
So in each implementation I would have to duplicate code for the singleton pattern. And secondly tens of CS files also sounds a bit overkill, since what I'm really after is just a single function returning some results that can be cached for various DataCache objects. Surely there must be a smarter way of doing this, and I would greatly appreciate a nudge in the right direction.
What I meant with my comment was something like this:
abstract class BaseClass<K,T> where T : BaseClass<K,T>, new()
{
private static T _instance;
public static T Instance
{
get
{
_instance = _instance ?? new T();
return _instance;
}
}
}
class ActualClass : BaseClass<int, ActualClass>
{
public ActualClass() {}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine(ActualClass.Instance.GetType().ToString());
Console.ReadLine();
}
}
The only problem here is that you'll have a public constructor.
I refined my previous answer but as it is rather different than the other approach I proposed, I thought I might just make another answer. First, we'll need to declare some interfaces:
// Where to find cached data
interface DataRepository {
void cacheData(Key k, Data d);
Data retrieveData(Key k, Data d);
};
// If by any chance we need an algorithm somewhere
interface AlgorithmRepository {
Algorithm getAlgorithm(Key k);
}
// The algorithm that process data
interface Algorithm {
void processData(Data in, Data out);
}
Given these interfaces, we can define some basic implementation for the algorithm repository:
class BaseAlgorithmRepository {
// The algorithm dictionnary
Map<Key, Algorithm> algorithms;
// On init, we'll build our repository using this function
void setAlgorithmForKey(Key k, Algorithm a) {
algorithms.put(k, a);
}
// ... implement the other function of the interface
}
Then we can also implement something for the DataRepository
class DataRepository {
AlgorithmRepository algorithmRepository;
Map<Key, Data> cache;
void cacheData(Key k, Data d) {
cache.put(k, d);
}
Data retrieveData(Key k, Data in) {
Data d = cache.get(k);
if (d==null) {
// Data not found in the cache, then we try to produce it ourself
Data d = new Data();
Algorithm a = algorithmRepository.getAlgorithm(k);
a.processData(in, d);
// This is optional, you could simply throw an exception to say that the
// data has not been cached and thus, the algorithm succession did not
// produce the necessary data. So instead of the above, you could simply:
// throw new DataNotCached(k);
// and thus halt the whole processing
}
return d;
}
}
Finally, we get to implement algorithms:
abstract class BaseAlgorithm {
DataRepository repository;
}
class SampleNoCacheAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// do something with in to compute out
}
}
class SampleCacheProducerAlgorithm extends BaseAlgorithm {
static Key KEY = "SampleCacheProducerAlgorithm.myKey";
void processData(Data in, Data out) {
// do something with in to compute out
// then call repository.cacheData(KEY, out);
}
}
class SampleCacheConsumerAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// Data tmp = repository.retrieveData(SampleCacheProducerAlgorithm.KEY, in);
// do something with in and tmp to compute out
}
}
To build on this, I think you could also define some special kinds of algorithms that are just in fact composites of other algorithms but also implement the Algorithm interface. An example could be:
class AlgorithmChain extends BaseAlgorithm {
List<Algorithms> chain;
void processData(Data in, Data out) {
Data currentIn = in;
foreach (Algorithm a : chain) {
Data currentOut = new Data();
a.processData(currentIn, currentOut);
currentIn = currentOut;
}
out = currentOut;
}
}
One addition I would make to this is a DataPool, that would allow you to reuse exisiting but unused Data objects in order to avoid allocating lots of memory each time you make a new Data().
I think this set of classes could give a good basis to your whole architecture, with the additional benefit that it does not employ any Singleton (always passing references to the concerned objects). Which means also that implementing dummy classes for unit tests would be rather easy.
You could have your algorithms independant of their results:
class Engine<T> {
Map<AlgorithmKey, Algorithm<T>> algorithms;
Map<AlgorithmKey, Data> algorithmsResultCache;
T processData(Data in);
}
interface Algorithm<T> {
boolean doesResultNeedsToBeCached();
T processData(Data in);
}
Then you Engine is responsible for instanciating the algorithms which are only pieces of code where the input is data and the output is either null or some data. Each algorithm can say whether his result needs to be cached or not.
In order to refine my answer, I think you should give some precisions about how the algorithms are to be run (is there an order, is it user adjustable, do we know in advance the algorithms that will be run, ...).
Can you register your algorithm instances with a combined repository/factory of algorithms that'll keep references to them? The repository could be a singleton, and, if you give the repository control of algorithm instantiation, you could use it to ensure that only one instance of each existed.
public class AlgorithmRepository
{
//... use boilerplate singleton code
public void CreateAlgorithm(Algorithms algorithm)
{
//... add to some internal hash or map, checking that it hasn't been created already
//... Algorithms is just an enum telling it which to create (clunky factory
// implementation)
}
public void ComputeResult(Algorithms algorithm, DataCache datacache)
{
// Can lazy load algoirthms here and make CreateAlgorithm private ..
CreateAlgorithm(algorithm);
//... compute and return.
}
}
This said, having a separate class (and cs file) for each algorithm makes sense to me. You could break with convention and have multiple algo classes in a single cs file if they're lightweight and it makes it easier to manage if you're worried about the number of files -- there are worse things to do. FWIW I'd just put up with the number of files ...
Typically when you create a Singleton class you don't want to inherit from it. When you do this you lose some of the goodness of the Singleton pattern (and what I hear from the pattern zealots is that an angel loses its wings every time you do something like this). But lets be pragmatic...sometimes you do what you have to do.
Regardless I do not think combining generics and inheritance will work in this instance anyway.
You indicated the number of algorithms will be in the tens (not hundreds). As long is this is the case I would create a dictionary keyed off of System.Type and store references to your methods as the values of the dictionary. In this case I used
Func<DataCache, object> as the dictionary value signature.
When the class instantiates for the first time register all your available algorithms in the dictionary. At runtime when the class needs to execute an algorithm for type T it will get the Type of T and look up the alogorithm in the dictionary.
If the code for the algorithms will be relatively involved I would suggest splitting them off into partial classes just to keep your code readable.
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeType(DataCache dataCache)
{
return new SomeType();
}
}
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeOtherType(DataCache dataCache)
{
return new SomeOtherType();
}
}
public sealed partial class Algorithm<T>
{
private readonly Dictionary<System.Type, Func<DataCache, object>> _algorithms = new Dictionary<System.Type, Func<DataCache, object>>();
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
private Algorithm() { }
private static Algorithm<T> _instance;
public static Algorithm<T> Instance
{
get
{
if (_instance == null)
{
_instance = new Algorithm<T>();
_instance._algorithms.Add(typeof(SomeType), ExecuteForSomeType);
_instance._algorithms.Add(typeof(SomeOtherType), ExecuteForSomeOtherType);
}
return _instance;
}
}
public T ComputeResult(DataCache dataCache, bool save = false)
{
T returnValue = (T)(new object());
if (_resultsStrong.ContainsKey(dataCache))
{
returnValue = _resultsStrong[dataCache];
return returnValue;
}
if (_resultsWeak.ContainsKey(dataCache))
{
returnValue = (T)_resultsWeak[dataCache].Target;
if (returnValue != null) return returnValue;
}
returnValue = (T)_algorithms[returnValue.GetType()](dataCache);
_resultsWeak[dataCache] = new WeakReference(returnValue, true);
if (save) _resultsStrong[dataCache] = returnValue;
return returnValue;
}
}
First off, I'd suggest you rename DataCache to something like DataInput for more clarity, because it's easy to confuse it with objects that really act as caches (_resultsWeak and _resultsStrong) to store the results.
Concerning the need for these caches to remain in memory for future use, maybe you should consider placing them in one of the wider scopes that exist in a .NET application than the object scope, Application or Session for example.
You could also use an AlgorithmLocator (see ServiceLocator pattern) as a single point of access to all Algorithms to get rid of the singleton logic duplication in each Algorithm.
Other than that, I find your solution to be a nice one globally. Whether or not it is overkill will basically depend on the homogeneity of your algorithms. If they all have the same way of caching data, of returning their results... it will be a great benefit to have all that logic factored out in a single place. But we lack context here to judge.
Encapsulating the caching logic in a specific object held by the Algorithm (CachingStrategy ?) would also be an alternative to inheriting it, but maybe a bit awkward since the caching object would have to access the cache before and after calculation and would need to be able to trigger algorithm calculation itself and have a hand on the results.
[Edit] if you're concerned with having one .cs file per algorithm, you can always group all Algorithm classes pertaining to a particular T in the same file.

Initializing constructor from stored cache in C#

I'm not sure exactly how to describe this question, but here goes. I've got a class hierarchy of objects that are mapped in a SQLite database. I've already got all the non-trivial code written that communicates between the .NET objects and the database.
I've got a base interface as follows:
public interface IBackendObject
{
void Read(int id);
void Refresh();
void Save();
void Delete();
}
This is the basic CRUD operations on any object. I've then implemented a base class that encapsulates much of the functionality.
public abstract class ABackendObject : IBackendObject
{
protected ABackendObject() { } // constructor used to instantiate new objects
protected ABackendObject(int id) { Read(id); } // constructor used to load object
public void Read(int id) { ... } // implemented here is the DB code
}
Now, finally, I have my concrete child objects, each of which have their own tables in the database:
public class ChildObject : ABackendObject
{
public ChildObject() : base() { }
public ChildObject(int id) : base(id) { }
}
This works fine for all my purposes so far. The child has several callback methods that are used by the base class to instantiate the data properly.
I now want to make this slightly efficient. For example, in the following code:
public void SomeFunction1()
{
ChildObject obj = new ChildObject(1);
obj.Property1 = "blah!";
obj.Save();
}
public void SomeFunction2()
{
ChildObject obj = new ChildObject(1);
obj.Property2 = "blah!";
obj.Save();
}
In this case, I'll be constructing two completely new memory instantiations and depending on the order of SomeFunction1 and SomeFunction2 being called, either Property1 or Property2 may not be saved. What I want to achieve is a way for both these instantiations to somehow point to the same memory location--I don't think that will be possible if I'm using the "new" keyword, so I was looking for hints as to how to proceed.
Ideally, I'd want to store a cache of all loaded objects in my ABackendObject class and return memory references to the already loaded objects when requested, or load the object from memory if it doesn't already exist and add it to the cache. I've got a lot of code that is already using this framework, so I'm of course going to have to change a lot of stuff to get this working, but I just wanted some tips as to how to proceed.
Thanks!
If you want to store a "cache" of loaded objects, you could easily just have each type maintain a Dictionary<int, IBackendObject> which holds loaded objects, keyed by their ID.
Instead of using a constructor, build a factory method that checks the cache:
public abstract class ABackendObject<T> where T : class
{
public T LoadFromDB(int id) {
T obj = this.CheckCache(id);
if (obj == null)
{
obj = this.Read(id); // Load the object
this.SaveToCache(id, obj);
}
return obj;
}
}
If you make your base class generic, and Read virtual, you should be able to provide most of this functionality without much code duplication.
What you want is an object factory. Make the ChildObject constructor private, then write a static method ChildObject.Create(int index) which returns a ChildObject, but which internally ensures that different calls with the same index return the same object. For simple cases, a simple static hash of index => object will be sufficient.
If you're using .NET Framework 4, you may want to have a look at the System.Runtime.Caching namespace, which gives you a pretty powerful cache architecture.
http://msdn.microsoft.com/en-us/library/system.runtime.caching.aspx
Sounds perfect for a reference count like this...
#region Begin/End Update
int refcount = 0;
ChildObject record;
protected ChildObject ActiveRecord
{
get
{
return record;
}
set
{
record = value;
}
}
public void BeginUpdate()
{
if (count == 0)
{
ActiveRecord = new ChildObject(1);
}
Interlocked.Increment(ref refcount);
}
public void EndUpdate()
{
int count = Interlocked.Decrement(ref refcount);
if (count == 0)
{
ActiveRecord.Save();
}
}
#endregion
#region operations
public void SomeFunction1()
{
BeginUpdate();
try
{
ActiveRecord.Property1 = "blah!";
}
finally
{
EndUpdate();
}
}
public void SomeFunction2()
{
BeginUpdate();
try
{
ActiveRecord.Property2 = "blah!";
}
finally
{
EndUpdate();
}
}
public void SomeFunction2()
{
BeginUpdate();
try
{
SomeFunction1();
SomeFunction2();
}
finally
{
EndUpdate();
}
}
#endregion
I think your on the right track more or less. You can either create a factory which creates your child objects (and can track "live" instances), or you can keep track of instances which have been saved, so that when you call your Save method it recognizes that your first instance of ChildObject is the same as your second instance of ChildObject and does a deep copy of the data from the second instance over to the first. Both of these are fairly non-trivial from a coding standpoint, and both probably involve overriding the equality methods on your entities. I tend to think that using the first approach would be less likely to cause errors.
One additional option would be to use an existing Obect-Relational mapping package like NHibernate or Entity Framework to do your mapping between objects and your database. I know NHibernate supports Sqlite, and in my experience tends to be the one that requires the least amount of change to your entity structures. Going that route you get the benefit of the ORM layer tracking instances for you (and generating SQL for you), plus you would probably get some more advanced features your current data access code may not have. The downside is that these frameworks tend to have a learning curve associated with them, and depending on which you go with there could be a not insignificant impact on the rest of your code. So it would be worth weighing the benefits against the cost of learning the framework and converting your code to use the API.

Designing different Factory classes (and what to use as argument to the factories!)

Let's say we have the following piece of code:
public class Event { }
public class SportEvent1 : Event { }
public class SportEvent2 : Event { }
public class MedicalEvent1 : Event { }
public class MedicalEvent2 : Event { }
public interface IEventFactory
{
bool AcceptsInputString(string inputString);
Event CreateEvent(string inputString);
}
public class EventFactory
{
private List<IEventFactory> factories = new List<IEventFactory>();
public void AddFactory(IEventFactory factory)
{
factories.Add(factory);
}
//I don't see a point in defining a RemoveFactory() so I won't.
public Event CreateEvent(string inputString)
{
try
{
//iterate through all factories. If one and only one of them accepts
//the string, generate the event. Otherwise, throw an exception.
return factories.Single(factory => factory.AcceptsInputString(inputString)).CreateEvent(inputString);
}
catch (InvalidOperationException e)
{
throw new InvalidOperationException("Either there was no valid factory avaliable or there was more than one for the specified kind of Event.", e);
}
}
}
public class SportEvent1Factory : IEventFactory
{
public bool AcceptsInputString(string inputString)
{
return inputString.StartsWith("SportEvent1");
}
public Event CreateEvent(string inputString)
{
return new SportEvent1();
}
}
public class MedicalEvent1Factory : IEventFactory
{
public bool AcceptsInputString(string inputString)
{
return inputString.StartsWith("MedicalEvent1");
}
public Event CreateEvent(string inputString)
{
return new MedicalEvent1();
}
}
And here is the code that runs it:
static void Main(string[] args)
{
EventFactory medicalEventFactory = new EventFactory();
medicalEventFactory.AddFactory(new MedicalEvent1Factory());
medicalEventFactory.AddFactory(new MedicalEvent2Factory());
EventFactory sportsEventFactory = new EventFactory();
sportsEventFactory.AddFactory(new SportEvent1Factory());
sportsEventFactory.AddFactory(new SportEvent2Factory());
}
I have a couple of questions:
Instead of having to add factories
here in the main method of my
application, should I try to
redesign my EventFactory class so it
is an abstract factory? It'd be
better if I had a way of not having
to manually add
EventFactories every time I want to
use them. So I could just instantiate MedicalFactory and SportsFactory. Should I make a Factory of factories? Maybe that'd be over-engineering?
As you have probably noticed, I am using a inputString string as argument to feed the factories. I have an application that lets the user create his own events but also to load/save them from text files. Later, I might want to add other kinds of files, XML, sql connections, whatever. The only way I can think of that would allow me to make this work is having an internal format (I choose a string, as it's easy to understand). How would you make this? I assume this is a recurrent situation, probably most of you know of any other more intelligent approach to this. I am then only looping in the EventFactory for all the factories in its list to check if any of them accepts the input string. If one does, then it asks it to generate the Event.
If you find there is something wrong or awkward with the method I'm using to make this happen, I'd be happy to hear about different implementations. Thanks!
PS: Although I don't show it in here, all the different kind of events have different properties, so I have to generate them with different arguments (SportEvent1 might have SportName and Duration properties, that have to be put in the inputString as argument).
I am not sure about the input string question but for the first question you can likely use "convention over configuration"; a combination of reflection, the IEventFActory type and the naming you already have in place, Name.EndsWith("EventFactory") should allow you to instantiate the factories and get them into their Lists with code.
HTH ,
Berryl

Categories

Resources