Is the use of .Net Lazy class an overkill in this case? - c#

I learned about Lazy class in .Net recently and have been probably over-using it. I have an example below where things could have been evaluated in an eager fashion, but that would result in repeating the same calculation if called over and over. In this particular example the cost of using Lazy might not justify the benefit, and I am not sure about this, since I do not yet understand just how expensive lambdas and lazy invocation are. I like using chained Lazy properties, because I can break complex logic into small, manageable chunks. I also no longer need to think about where is the best place to initialize stuff - all I need to know is that things will not be initialized if I do not use them and will be initialized exactly once before I start using them. However, once I start using lazy and lambdas, what was a simple class is now more complex. I cannot objectively decide when this is justified and when this is an overkill in terms of complexity, readability, possibly speed. What would your general recommendation be?
// This is set once during initialization.
// The other 3 properties are derived from this one.
// Ends in .dat
public string DatFileName
{
get;
private set;
}
private Lazy<string> DatFileBase
{
get
{
// Removes .dat
return new Lazy<string>(() => Path.GetFileNameWithoutExtension(this.DatFileName));
}
}
public Lazy<string> MicrosoftFormatName
{
get
{
return new Lazy<string>(() => this.DatFileBase + "_m.fmt");
}
}
public Lazy<string> OracleFormatName
{
get
{
return new Lazy<string>(() => this.DatFileBase + "_o.fmt");
}
}

This is probably a little bit of overkill.
Lazy should usually be used when the generic type is expensive to create or evaluate, and/or when the generic type is not always needed in every usage of the dependent class.
More than likely, anything calling your getters here will need an actual string value immediately upon calling your getter. To return a Lazy in such a case is unnecessary, as the calling code will simply evaluate the Lazy instance immediately to get what it really needs. The "just-in-time" nature of Lazy is wasted here, and therefore, YAGNI (You Ain't Gonna Need It).
That said, the "overhead" inherent in Lazy isn't all that much. A Lazy is little more than a class referencing a lambda that will produce the generic type. Lambdas are relatively cheap to define and execute; they're just methods, which are given a mashup name by the CLR when compiled. The instantiation of the extra class is the main kicker, and even then it's not terrible. However, it's unnecessary overhead from both a coding and performance perspective.

You said "i no longer need to think about where is the best place to initialize stuff".
This is a bad habit to get in to. You should know exactly what's going on in your program
You should Lazy<> when there's an object that needs to be passed, but requires some computation.
So only when it will be used it will be calculated.
Besides that, you need to remember that the object you retrieve with the lazy is not the object that was in the program's state when it was requested.
You'll get the object itself only when it will be used. This will be hard to debug later on if you get objects that are important to the program's state.

This does not appear to be using Lazy<T> for the purpose of saving creation/loading of an expensive object so much as it is to (perhaps unintentionally) be wrapping some arbitrary delegate for delayed execution. What you probably want/intend your derived property getters to return is a string, not a Lazy<string> object.
If the calling code looks like
string fileName = MicrosoftFormatName.Value;
then there is obviously no point, since you are "Lazy-Loading" immediately.
If the calling code looks like
var lazyName = MicrosoftFormatName; // Not yet evaluated
// some other stuff, maybe changing the value of DatFileName
string fileName2 = lazyName.Value;
then you can see there is a chance for fileName2 to not be determinable when the lazyName object is created.
It seems to me that Lazy<T> isn't best used for public properties; here your getters are returning new (as in brand new, distinct, extra) Lazy<string> objects, so each caller will (potentially) get a different .Value! All of your Lazy<string> properties depend on DatFileName being set at the time their .Value is first accessed, so you will always need to think about when that is initialized relative to the use of each of the derived properties.
See the MSDN article "Lazy Initialization" which creates a private Lazy<T> backing variable and a public property getter that looks like:
get { return _privateLazyObject.Value; }
What I might guess your code should/might like, using Lazy<string> to define your "set-once" base property:
// This is set up once (durinig object initialization) and
// evaluated once (the first time _datFileName.Value is accessed)
private Lazy<string> _datFileName = new Lazy<string>(() =>
{
string filename = null;
//Insert initialization code here to determine filename
return filename;
});
// The other 3 properties are derived from this one.
// Ends in .dat
public string DatFileName
{
get { return _datFileName.Value; }
private set { _datFileName = new Lazy<string>(() => value); }
}
private string DatFileBase
{
get { return Path.GetFileNameWithoutExtension(DatFileName); }
}
public string MicrosoftFormatName
{
get { return DatFileBase + "_m.fmt"; }
}
public string OracleFormatName
{
get { return DatFileBase + "_o.fmt"; }
}

Using Lazy for creating simple string properties is indeed an overkill. Initializing the instance of Lazy with lambda parameter is probably much more expensive than doing single string operation. There's one more important argument that others didn't mention yet - remember that lambda parameter is resolved by the compiler to quite complex structure, far more comples than string concatenation.

The other area that is good to use lazy loading is in a type that can be consumed in a partial state. As an example, consider the following:
public class Artist
{
public string Name { get; set; }
public Lazy<Manager> Manager { get; internal set; }
}
In the above example, consumers may only need to utilise our Name property, but having to populate fields which may or may not be used could be a place for lazy loading. I say could not should, as there are always situations when it may be more performant to load all up front.... depending on what your application needs to do.

Related

Best way to manage a static property

I have this code:
public static class ProcessClass()
{
public static string MyProp {get;set;}
public static void ProcessMethod(InputObject input)
{
if(String.IsNullOrEmpty(MyProp)
MyProp = input.Name();
//do stuff
MyProp = null;
}
}
Now, everything except for MyProp was already in place. I needed a way to track the original input.Name throughout the code since it can change and also because ProcessMethod could get called again internally with a different input.Name value due to business rules, and I need to state that this second calling came from input.Name.
Obviously, this is bad since if two people do this at the same time, they will both share the same MyProp value, and simply making is null at the end seems dangerous and hacky. I don't really have the option of changing this method to be non static because that would involve changing A LOT of the codebase, which isn't really an option.
What are ways around using a static property, but still being able to keep my original input.Name value without risking thread safety issues? One option I am thinking is is to have every method accept an input.Name() and track is that way(and remove the MyProp), but I can picture that getting out of hand fast, and extremely messy.
I don't need to have this be a property, but if it is it obviously needs to be static in this class.
Since it may be a multi-user environment, then replace the string with a ConcurrentDictionary where you store the input.Name() as a Value and as Key the unique identifier of the user (an id, a name, etc).

Property initialisation anti-pattern

Now and again I end up with code along these lines, where I create some objects then loop through them to initialise some properties using another class...
ThingRepository thingRepos = new ThingRepository();
GizmoProcessor gizmoProcessor = new GizmoProcessor();
WidgetProcessor widgetProcessor = new WidgetProcessor();
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Loops through setting thing.Gizmo to a new Gizmo
gizmoProcessor.AddGizmosToThings(allThings);
// Loops through setting thing.Widget to a new Widget
widgetProcessor.AddWidgetsToThings(allThings);
return allThings;
}
...which just, well, feels wrong.
Is this a bad idea?
Is there a name of an anti-pattern that I'm using here?
What are the alternatives?
Edit: assume that both GizmoProcessor and WidgetProcessor have to go off and do some calculation, and get some extra data from other tables. They're not just data stored in a repository. They're creating new Gizmos and Widgets based on each Thing and assigning them to Thing's properties.
The reason this feels odd to me is that Thing isn't an autonomous object; it can't create itself and child objects. It's requiring higher-up code to create a fully finished object. I'm not sure if that's a bad thing or not!
ThingRepository is supposed to be the single access point to get collections of Thing's, or at least that's where developers will intuitively look. For that reason, it feels strange that GetThings(DateTime date) should be provided by another object. I'd rather place that method in ThingRepository itself.
The fact that the Thing's returned by GetThings(DateTime date) are different, "fatter" animals than those returned by ThingRepository.FetchThings() also feels awkward and counter-intuitive. If Gizmo and Widget are really part of the Thing entity, you should be able to access them every time you have an instance of Thing, not just for instances returned by GetThings(DateTime date).
If the Date parameter in GetThings() isn't important or could be gathered at another time, I would use calculated properties on Thing to implement on-demand access to Gizmo and Widget :
public class Thing
{
//...
public Gizmo Gizmo
{
get
{
// calculations here
}
}
public Widget Widget
{
get
{
// calculations here
}
}
}
Note that this approach is valid as long as the calculations performed are not too costly. Calculated properties with expensive processing are not recommended - see http://msdn.microsoft.com/en-us/library/bzwdh01d%28VS.71%29.aspx#cpconpropertyusageguidelinesanchor1
However, these calculations don't have to be implemented inline in the getters - they can be delegated to third-party Gizmo/Widget processors, potentially with a caching strategy, etc.
If you have complex intialization then you could use a Strategy pattern. Here is a quick overview adapted from this strategy pattern overview
Create a strategy interface to abstract the intialization
public interface IThingInitializationStrategy
{
void Initialize(Thing thing);
}
The initialization implementation that can be used by the strategy
public class GizmosInitialization
{
public void Initialize(Thing thing)
{
// Add gizmos here and other initialization
}
}
public class WidgetsInitialization
{
public void Initialize(Thing thing)
{
// Add widgets here and other initialization
}
}
And finally a service class that accepts the strategy implementation in an abstract way
internal class ThingInitalizationService
{
private readonly IThingInitializationStrategy _initStrategy;
public ThingInitalizationService(IThingInitializationStrategy initStrategy)
{
_initStrategy = initStrategy;
}
public Initialize(Thing thing)
{
_initStrategy.Initialize(thing);
}
}
You can then use the initialization strategies like so
var initializationStrategy = new GizmosInitializtion();
var initializationService = new ThingInitalizationService(initializationStrategy);
List<Thing> allThings = thingRepos.FetchThings();
allThings.Foreach ( thing => initializationService.Initialize(thing) );
Tho only real potential problem would be that you're iterating over the same loop multiple times, but if you need to hit a database to get all the gizmos and widgets then it might be more efficient to request them in batches so passing the full list to your Add... methods would make sense.
The other option would be to look into returning the gizmos and widgets with the thing in the first repository call (assuming they reside in the same repo). It might make the query more complex, but it would probably be more efficient. Unless of course you don't ALWAYS need to get gizmos and widgets when you fetch things.
To answer your questions:
Is this a bad idea?
From my experience, you rarely know if it's a good/bad idea until you need to change it.
IMO, code is either: Over-engineered, under-engineered, or unreadable
In the meantime, you do your best and stick to the best practices (KISS, single responsibility, etc)
Personally, I don't think the processor classes should be modifying the state of any Thing.
I also don't think the processor classes should be given a collection of Things to modify.
Is there a name of an anti-pattern that I'm using here?
Sorry, unable to help.
What are the alternatives?
Personally, I would write the code as such:
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
thing.Gizmo = gizmoProcessor.BuildGizmo(thing);
thing.Widget = widgetProcessor.BuildWidget(thing);
}
return allThings;
}
My reasons being:
The code is in a class that "Gets things". So logically, I think it's acceptable for it to traverse each Thing object and initialise them.
The intention is clear: I'm initialising the properties for each Thing before returning them.
I prefer initialising any properties of Thing in a central location.
I don't think that gizmoProcessor and widgetProcessor classes should have any business with a Collection of Things
I prefer the Processors to have a method to build and return a single widget/gizmo
However, if your processor classes are building several properties at once, then only would I refactor the property initialisation to each processor.
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
// [Edited]
// Notice a trend here: The common Initialize(Thing) interface
// Could probably be refactored into some
// super-mega-complex Composite Builder-esque class should you ever want to
gizmoProcessor.Initialize(thing);
widgetProcessor.Initialize(thing);
}
return allThings;
}
P.s.:
I personally do not care that much for (Anti)Pattern names.
While it helps to discuss a problem at a higher level of abstraction, I wouldn't commit every (anti)pattern names to memory.
When I come across a Pattern that I believe is helpful, then only do I remember it.
I'm quite lazy, and my rationale is that: Why bother remembering every pattern and anti pattern if I'm only going to use a handful?
[Edit]
Noticed an answer was already given regarding using a Strategy Service.

When I need some item, should I just use its "int id" instead?

My application has InstrumentFactory - the only place where I create Instrument instance. Each instrument instance contains several fields, such as Ticker=MSFT and GateId=1 and also unique Id =1.
And now I realized that I almost never need Instrument instance. In 90% of cases I just need Id. For example now I have such method:
public InstrumentInfo GetInstrumentInfo(Instrument instrument)
{
return instrumentInfos[instrument.Id];
}
We know that we should not pass in parameters more information than required. So this code probably should be refactored to:
public InstrumentInfo GetInstrumentInfo(int instrumentId)
{
return instrumentInfos[instrumentId];
}
90% of my code can now be refactored to use instrumentId instead of Instrument.
Should I do that? Changing everywhere Instrument to instrumentId will make it as a hard requirement (each Instrument should have exactly one unique id). But what benefits will I have? In return of "hard requirements" I want to have some benefits for that... (speed, readability?) But I don't see them.
Using ids everywhere instead of the object is wrong approach and it goes against the spirit of OOP.
There are two big advantages to using the object itself:
It's type-safe. You can't accidentally pass something like Person to the first version, but you can accidentally pass person.Id to the second.
It makes your code easy to modify. If, in the future, you decide that you need long ids, or some other way to identify a unique Instrument, you won't need to change the calling code.
And you should probably change your dictionary too, it should be something like Dictionary<Instrument, InstrumentInfo>, not Dictionary<int, InstrumentInfo>, like you have now. This way, you get both of the advantages there too. To make it work, you need to implement equality in Instrument, which means properly overriding Equals() and GetHashCode() and ideally also implementing IEquatable<Instrument>.
It's always better to work in terms of objects than primitive values like integers. If tomorrow your requirements happen to change and you need more than just the ID, it is easy to add those to the Instrument object instead of changing all your code.
GetInstrumentInfo(int instrumentId);
This probably means that the client code has to have a:
GetInstrumentInfo(instrument.Id);
Don't let the users of your method worry about small details like that. Let them just pass the entire object and let your method do the work.
Don't see any major performance disadvantage. Whether you pass an Int or reference to the actual object.
Say you wanted to develop GetInstrumentInfo a bit more, its easier to have access to the entire object than just an Int.
The first thing you need to ask yourself is this:
"If I have two instruments with ID == 53, then does that mean they are definitely the same instrument, no matter what? Or is there a meaningful case where they could be different?"
Assuming the answer is "they are both the same. If any other property differs, that is either a bug or because one such object was obtained after another, and that will resolve itself soon enough (when whatever thread of processing is using the older instrument, stops using it)" then:
First, internally, just use whatever you find handier. You'll quite likely find that this is to go by the int all the time, though you get some type-safety from insisting that an Instrument is passed to the method. This is especially true if all Instrument construction happens from an internal or private constructor accessed via factory methods, and there is no way for a user of the code to create a bogus Instrument with an id that doesn't match anything in your system.
Define equality as such:
public class Instrument : IEquatable<Instrument>
{
/* all the useful stuff you already have */
public bool Equals(Instrument other)
{
return other != null && Id == other.Id;
}
public override bool Equals(object other)
{
return Equals(other as Instrument);
}
public override int GetHashCode()
{
return Id;
}
}
Now, especially when we consider that the above is likely to be inlined most of the time, there is pretty much no implementation difference as to whether we use the ID or the object in terms of equality, and hence also in terms of using them as a key.
Now, you can define all of your public methods in any of the following means:
public InstrumentInfo GetInstrumentInfo(Instrument instrument)
{
return instrumentInfos[instrument];
}
Or:
public InstrumentInfo GetInstrumentInfo(Instrument instrument)
{
return instrumentInfos[instrument.Id];
}
Or:
public InstrumentInfo GetInstrumentInfo(Instrument instrument)
{
return GetInstrumentInfo(instrument.Id);
}
private InstrumentInfo GetInstrumentInfo(int instrumentID)
{
return instrumentInfos[instrumentID]
}
The performance impact will be the same, whichever you go for. The code presented to users will be type-safe and guarantee they don't pass in bogus values. The implementation picked can be simply that which you find more convenient for other reasons.
Since it won't cost you any more to use the instrument itself as a key internally, I'd still recommend you do that (the first of the three options above) as the type-safety and making it hard to pass in bogus values will then apply to your internal code too. If on the other hand you find that a set of calls keep just using the id anyway (if e.g. they are talking to a database layer to which only the ID means anything), then changing just those places becomes quick and easy for you, and hidden from the user.
You also give your users the ability to use your object as a key and to do quick equality comparisons, if it suits them to do so.
You could overload each function, one that takes an instrument and one that takes an id:
public InstrumentInfo GetInstrumentInfo(Instrument instrument)
{
// call GetInstrumentInfo passing the id of the object
return GetInstrumentInfo[instrument.Id];
}
public InstrumentInfo GetInstrumentInfo(int instrumentId)
{
return instrumentInfos[instrumentId];
}
This will give you enough flexibility so that while you go through any place that calls GetInstrumentInfo to change it to pass id, the current code will still function.
As to whether or not you "should" is purely up to you. You would have to weigh how much time it would take to change it versus the benefit of making the change in the code.

Avoiding repeatedly allocating an Action object without a variable / member

Often I need to minimise object allocations within code that runs very frequently.
Of course I can use normal techniques like object pooling, but sometimes I just want something that's contained locally.
To try and achieve this, I came up with the below:
public static class Reusable<T> where T : new()
{
private static T _Internal;
private static Action<T> _ResetAction;
static Reusable()
{
_Internal = Activator.CreateInstance<T>();
}
public static void SetResetAction(Action<T> resetAction)
{
_ResetAction = resetAction;
}
public static T Get()
{
#if DEBUG
if (_ResetAction == null)
{
throw new InvalidOperationException("You must set the reset action first");
}
#endif
_ResetAction(_Internal);
return _Internal;
}
}
Currently, the usage would be:
// In initialisation function somewhere
Reuseable<List<int>>.SetResetAction((l) => l.Clear());
....
// In loop
var list = Reuseable<List<int>>.Get();
// Do stuff with list
What I'd like to improve, is the fact that the whole thing is not contained in one place (the .SetResetAction is separate to where it's actually used).
I'd like to get the code to something like below:
// In loop
var list = Reuseable<List<int>>.Get((l) => l.Clear());
// Do stuff with list
The problem with this is that i get an object allocation (it creates an Action<T>) every loop.
Is it possible to get the usage I'm after without any object allocations?
Obviously I could create a ReuseableList<T> which would have a built-in Action but I want to allow for other cases where the action could vary.
Are you sure that creates a new Action<T> on each iteration? I suspect it actually doesn't, given that it doesn't capture any variables. I suspect if you look at the IL generated by the C# compiler, it will cache the delegate.
Of course, that's implementation-specific...
EDIT: (I was just leaving before I had time to write any more...)
As Eric points out in the comment, it's not a great idea to rely on this. It's not guaranteed, and it's easy to accidentally break it even when you don't change compiler.
Even the design of this looks worrying (thread safety?) but if you must do it, I'd probably turn it from a static class into a "normal" class which takes the reset method (and possibly the instance) in a constructor. That's a more flexible, readable and testable approach IMO.

Why does do this code do if(sz !=sz2) sz = sz2?

For the first time i created a linq to sql classes. I decided to look at the class and found this.
What... why is it doing if(sz !=sz2) { sz = sz2; }. I dont understand. Why isnt the set generated as this._Property1 = value?
private string _Property1;
[Column(Storage="_Property1", CanBeNull=false)]
public string Property1
{
get
{
return this._Property1;
}
set
{
if ((this._Property1 != value))
{
this._Property1 = value;
}
}
}
It only updates the property if it has changed. This is probably based on the assumption that a comparison is cheaper than updating the reference (and all the entailed memory management) that might be involved.
Where are you seeing that? The usual LINQ-to-SQL generated properties look like the following:
private string _Property1;
[Column(Storage="_Property1", CanBeNull=false)]
public string Property1 {
get {
return this._Property1;
}
set {
if ((this._Property1 != value)) {
this.OnProperty1Changing(value);
this.SendPropertyChanging();
this._Property1 = value;
this.SendPropertyChanged("Property1");
this.OnProperty1Changed();
}
}
}
And now it's very clear that the device is to avoid sending property changing/changed notifications when the property is not actually changing.
Now, it turns out that OnProperty1Changing and OnProperty1Changed are partial methods so that if you don't declare a body for them elsewhere the calls to those methods will not be compiled into the final assembly (so if, say, you were looking in Reflector you would not see these calls). But SendPropertyChanging and SendPropertyChanged are protected methods that can't be compiled out.
So, did you perhaps change a setting that prevents the property changing/changed notifications from being emitted by the code generator?
Setting a field won't cause property change notifications, so that's not the reason.
I would guess that this design choice was driven by something like the following:
That string is an immutable reference type. Therefore the original and new instances are interchangeable. However the original instance may have been around longer and on average may therefore be slightly more expensive to collect (*). So performance may be better if the original instance is retained rather than being replaced by a new identical instance.
(*) The new value has in most cases only just been allocated, and won't be reused after the property is set. So it is very often a Gen0 object that is efficient to collect, whereas the original value's GC generation is unknown.
If this reasoning is correct, I wouldn't expect to see the same pattern for value-type properties (int, double, DateTime, ...).
But of course this is only speculation and I may be completely wrong.
Looks like there's persistence going on here. If something is using reflection (or a pointcut, or something) to create a SQL UPDATE query when _Property1 changes, then it'll be very much more expensive to update the field than to do the comparison.
It comes from Heijlsberg's ObjectPascal root.... at least that's how most of the Borland Delphi VCL is implemented... ;)

Categories

Resources