C# create List<T> in initialization vs get

C# create List<T> in initialization vs get - c#

I was wondering what is the best method to create a list in a certain object.
1) DefA "always" occupies memory beforehand even if it is never called, right?
2) DefB will "always" have to check for the null condition or does the compiler optimizes this?
3) Is there a better way to implement this?
Thanks
private List<A> _defA = new List<A>();
public List<A> DefA
{
get { return _defA; }
}
private List<B> _defB;
public List<B> DefB
{
get
{
if (_defB == null)
_defB = new List<B>();
return _defB;
}
}

Because I think both options will not affect on performance of your application, my suggestion to choose one which keep code cleaner
Use Lazy type - Lazy on MSDN
From MSDN about Lazy initialization:
By default, Lazy objects are thread-safe. That is, if the
constructor does not specify the kind of thread safety, the Lazy
objects it creates are thread-safe. In multi-threaded scenarios, the
first thread to access the Value property of a thread-safe Lazy
object initializes it for all subsequent accesses on all threads, and
all threads share the same data. Therefore, it does not matter which
thread initializes the object, and race conditions are benign.
So in your case
private Lazy<List<A>> _defA = new Lazy<List<A>>(() => new List<A>());
public List<A> DefA
{
get
{
return _defA.Value;
}
}
In addition this approach will tell your intents to other developers who may work with your code.

In this specific example, the delayed (lazy) instantiation might save a few milliseconds on startup; but at the risk of issues in a multi-threaded scenario.
Say two threads call DefB (Get) almost simultaneously - they might end up setting _defB twice, instead of the once that you intend.
_defA will always take the memory of an empty list, as I understand it, yes - so you'll save some memory the second way if it's not called - but it does make the code MUCH harder to understand. Also, what if a local piece of code doesn't call the accessor method, but just does _defB.Add() or whatever? (which might not be deliberate now, but because it's more complex it's easy to forget/miss in the future)

First of all, don't optimize something that doesn't need optimizing.
If you're creating thousands or millions of the object that contains that property, and this property is seldom used and thus seldom needed, then yes, adding lazy on-demand initialization is probably a good idea. I say probably because there may be other performance-related issues as well.
However, to answer your specific questions, other than "what is the best way":
The initialization of _defA will construct a List<A> object even if the property is never used, that is correct.
The getter method of DefB will always do the null check, that is also correct. The compiler cannot optimize this away.
As for "better way"? That part of the question falls into the "primarily opinion-based" close option here on Stack Overflow. It depends largely on what you determine is better:
More expressive syntax (shorter code)
Less memory spent (option B)
Less code in the getter (option A)
I can give you an alternative to the syntax in option A:
public List<A> DefA
{
get;
} = new List<A>();
This syntax is available in Visual Studio 2015 with C# 6 (even if you compiler for older .NET runtime versions) and is called Auto-property initializer.
The compiler will automagically create the backing field for you (the _defA equivalent) and mark it read-only, so feature-wise this is 100% identical to option A, it's just a different syntax.

Related

Is it best practice to create a variable if accessing a property of an object more than once in a routine?

When I first began as a junior C# dev, I was always told during code reviews that if I was accessing an object's property more than once in a given scope then I should create a local variable within the routine as it was cheaper than having to retrieve it from the object. I never really questioned it as it came from more people I perceived to be quite knowledgeable at the time.
Below is a rudimentary example
Example 1: storing an objects identifer in a local variable
public void DoWork(MyDataType object)
{
long id = object.Id;
if (ObjectLookup.TryAdd(id, object))
{
DoSomeOtherWork(id);
}
}
Example 2: retrieving the identifier from the Id property of the object property anytime it is needed
public void DoWork(MyDataType object)
{
if (ObjectLookup.TryAdd(object.Id, object))
{
DoSomeOtherWork(object.Id);
}
}
Does it actually matter or was it more a preference of coding style where I was working? Or perhaps a situational design time choice for the developer to make?

As explained in this answer, if the property is a basic getter/setter than the CLR "will inline the property access and generate code that’s as efficient as accessing a field directly". However, if your property, for example, does some calculations every time the property is accessed, then storing the value of the property in a local variable will avoid the overhead of additional calculations being done.

All the memory allocation stuff aside, there is the principle of DRY(don't repeat yourself). When you can deal with one variable with a short name rather than repeating the object nesting to access the external property, why not do that?
Apart from that, by creating that local variable you are respecting the single responsibility principle by isolating the methods from the external entity they don't need to know about.
And lastly if the so-called resuing leads to unwanted instantiation of reference types or any repetitive calculation, then it is a must to create the local var and reuse it throughout the class/method.
Any way you look at it, this practice helps with readability and more maintainable code, and possibly safer too.

I don't know if it is faster or not (though I would say that the difference is negligible and thus unimportant), but I'll cook up some benchmark for you.
What IS important though will be made evident to you with an example;
public Class MyDataType
{
publig int id {
get {
// Some actual code
return this.GetHashCode() * 2;
}
}
}
Does this make more sense? The first time I will access the id Getter, some code will be executed. The second time, the same code will be executed costing twice as much with no need.
It is very probable, that the reviewers had some such case in mind and instead of going into every single one property and check what you are doing and if it is safe to access, they created a new rule.
Another reason to store, would be useability.
Imagine the following example
object.subObject.someOtherSubObject.id
In this case I ask in reviews to store to a variable even if they use it just once. That is because if this is used in a complicated if statement, it will reduce the readability and maintainability of the code in the future.

A local variable is essentially guaranteed to be fast, whereas there is an unknown amount of overhead involved in accessing the property.
It's almost always a good idea to avoid repeating code whenever possible. Storing the value once means that there is only one thing to change if it needs changing, rather than two or more.
Using a variable allows you to provide a name, which gives you an opportunity to describe your intent.
I would also point out that if you're referring to other members of an object a lot in one place, that can often be a strong indication that the code you're writing actually belongs in that other type instead.

You should consider that getting a value from a method that is calculated from an I/O-bound or CPU-bound process can be irrational. Therefore, it's better to define a var and store the result to avoid multiple same processing.
In the case that you are using a value like object.Id, utilizing a variable decorated with const keyword guarantees that the value will not change in the scope.
Finally, it's better to use a local var in the classes and methods.

Is there any benefit of working on string directly rather than assigning it to a var?

Is there any benefit of doing this;
private void Method()
{
var data = ConfigurationManager.AppSettings["Data"].Split('-');
}
than doing this;
private void Method()
{
var _data = ConfigurationManager.AppSettings["Data"];
var data = _data.Split('-');
}
Case: I need to read bunch of configuration values like this in the same method, multiple times (let's say every time I instantiate this class).
How will both cases will affect the performance and memory? Or are they pretty much the same things? I see assigning it to a variable will allocate space on memory for no reason.

There will be the same IL code generated in both cases.
And don't forget about The Rules of Code Optimization

The compiler will reduce those to the exact same thing. No, there's no difference in this scenario. If you're ever curious, compile it in release mode, and use ildasm to look at what it did.
However! Performance questions should never be answered by hunch - or even asked on hunch. First, determine if you are actually trying to solve a real problem - otherwise you're probably just yak shaving.

In your first case since ConfigurationManager.AppSettings["Data"] will return a string there is no harm in chaining the Split() method with it than creating a extra variable.
In second case, it would be efficient if ConfigurationManager.AppSettings["Data"] would be used multiple places. In such case, instead of fetching it again and again, you fetch it once, store it to a variable and re-use it.

Both statements are equal. You have a false understanding on when space on your memory is allocated. This actually happens inside the AppSettings-call, not on assignement. Thus when you make any call to a member the result allready exists on memory. Storing this value in a variable does not increase anything - neither memory-allocation nor performance.
However if you´d store the result in a member of your class it´ll be garbage-collected far later than your local data-variable as it doesn´t get out of scope. In this case storing your result to the member will allocate memory as long as the instance exists.
Having said this it is in mostly all cases more important to focus on your code being maintainable, that is if other developers can understand it without asking what all this about.
This means you shouldn´t ask: which horse runs faster but instead which code is easier to understand?

Good or bad practice? Initializing objects in getter

I have a strange habit it seems... according to my co-worker at least. We've been working on a small project together. The way I wrote the classes is (simplified example):
[Serializable()]
public class Foo
{
public Foo()
{ }
private Bar _bar;
public Bar Bar
{
get
{
if (_bar == null)
_bar = new Bar();
return _bar;
}
set { _bar = value; }
}
}
So, basically, I only initialize any field when a getter is called and the field is still null. I figured this would reduce overload by not initializing any properties that aren't used anywhere.
ETA: The reason I did this is that my class has several properties that return an instance of another class, which in turn also have properties with yet more classes, and so on. Calling the constructor for the top class would subsequently call all constructors for all these classes, when they are not always all needed.
Are there any objections against this practice, other than personal preference?
UPDATE: I have considered the many differing opinions in regards to this question and I will stand by my accepted answer. However, I have now come to a much better understanding of the concept and I'm able to decide when to use it and when not.
Cons:
Thread safety issues
Not obeying a "setter" request when the value passed is null
Micro-optimizations
Exception handling should take place in a constructor
Need to check for null in class' code
Pros:
Micro-optimizations
Properties never return null
Delay or avoid loading "heavy" objects
Most of the cons are not applicable to my current library, however I would have to test to see if the "micro-optimizations" are actually optimizing anything at all.
LAST UPDATE:
Okay, I changed my answer. My original question was whether or not this is a good habit. And I'm now convinced that it's not. Maybe I will still use it in some parts of my current code, but not unconditionally and definitely not all the time. So I'm going to lose my habit and think about it before using it. Thanks everyone!

What you have here is a - naive - implementation of "lazy initialization".
Short answer:
Using lazy initialization unconditionally is not a good idea. It has its places but one has to take into consideration the impacts this solution has.
Background and explanation:
Concrete implementation:
Let's first look at your concrete sample and why I consider its implementation naive:
It violates the Principle of Least Surprise (POLS). When a value is assigned to a property, it is expected that this value is returned. In your implementation this is not the case for null:
foo.Bar = null;
Assert.Null(foo.Bar); // This will fail
It introduces quite some threading issues: Two callers of foo.Bar on different threads can potentially get two different instances of Bar and one of them will be without a connection to the Foo instance. Any changes made to that Bar instance are silently lost.
This is another case of a violation of POLS. When only the stored value of a property is accessed it is expected to be thread-safe. While you could argue that the class simply isn't thread-safe - including the getter of your property - you would have to document this properly as that's not the normal case. Furthermore the introduction of this issue is unnecessary as we will see shortly.
In general:
It's now time to look at lazy initialization in general:
Lazy initialization is usually used to delay the construction of objects that take a long time to be constructed or that take a lot of memory once fully constructed.
That is a very valid reason for using lazy initialization.
However, such properties normally don't have setters, which gets rid of the first issue pointed out above.
Furthermore, a thread-safe implementation would be used - like Lazy<T> - to avoid the second issue.
Even when considering these two points in the implementation of a lazy property, the following points are general problems of this pattern:
Construction of the object could be unsuccessful, resulting in an exception from a property getter. This is yet another violation of POLS and therefore should be avoided. Even the section on properties in the "Design Guidelines for Developing Class Libraries" explicitly states that property getters shouldn't throw exceptions:
Avoid throwing exceptions from property getters.
Property getters should be simple operations without any preconditions. If a getter might throw an exception, consider redesigning the property to be a method.
Automatic optimizations by the compiler are hurt, namely inlining and branch prediction. Please see Bill K's answer for a detailed explanation.
The conclusion of these points is the following:
For each single property that is implemented lazily, you should have considered these points.
That means, that it is a per-case decision and can't be taken as a general best practice.
This pattern has its place, but it is not a general best practice when implementing classes. It should not be used unconditionally, because of the reasons stated above.
In this section I want to discuss some of the points others have brought forward as arguments for using lazy initialization unconditionally:
Serialization:
EricJ states in one comment:
An object that may be serialized will not have it's contructor invoked when it is deserialized (depends on the serializer, but many common ones behave like this). Putting initialization code in the constructor means that you have to provide additional support for deserialization. This pattern avoids that special coding.
There are several problems with this argument:
Most objects never will be serialized. Adding some sort of support for it when it is not needed violates YAGNI.
When a class needs to support serialization there exist ways to enable it without a workaround that doesn't have anything to do with serialization at first glance.
Micro-optimization:
Your main argument is that you want to construct the objects only when someone actually accesses them. So you are actually talking about optimizing the memory usage.
I don't agree with this argument for the following reasons:
In most cases, a few more objects in memory have no impact whatsoever on anything. Modern computers have way enough memory. Without a case of actual problems confirmed by a profiler, this is pre-mature optimization and there are good reasons against it.
I acknowledge the fact that sometimes this kind of optimization is justified. But even in these cases lazy initialization doesn't seem to be the correct solution. There are two reasons speaking against it:
Lazy initialization potentially hurts performance. Maybe only marginally, but as Bill's answer showed, the impact is greater than one might think at first glance. So this approach basically trades performance versus memory.
If you have a design where it is a common use case to use only parts of the class, this hints at a problem with the design itself: The class in question most likely has more than one responsibility. The solution would be to split the class into several more focused classes.

It is a good design choice. Strongly recommended for library code or core classes.
It is called by some "lazy initialization" or "delayed initialization" and it is generally considered by all to be a good design choice.
First, if you initialize in the declaration of class level variables or constructor, then when your object is constructed, you have the overhead of creating a resource that may never be used.
Second, the resource only gets created if needed.
Third, you avoid garbage collecting an object that was not used.
Lastly, it is easier to handle initialization exceptions that may occur in the property then exceptions that occur during initialization of class level variables or the constructor.
There are exceptions to this rule.
Regarding the performance argument of the additional check for initialization in the "get" property, it is insignificant. Initializing and disposing an object is a more significant performance hit than a simple null pointer check with a jump.
Design Guidelines for Developing Class Libraries at http://msdn.microsoft.com/en-US/library/vstudio/ms229042.aspx
Regarding Lazy<T>
The generic Lazy<T> class was created exactly for what the poster wants, see Lazy Initialization at http://msdn.microsoft.com/en-us/library/dd997286(v=vs.100).aspx. If you have older versions of .NET, you have to use the code pattern illustrated in the question. This code pattern has become so common that Microsoft saw fit to include a class in the latest .NET libraries to make it easier to implement the pattern. In addition, if your implementation needs thread safety, then you have to add it.
Primitive Data Types and Simple Classes
Obvioulsy, you are not going to use lazy-initialization for primitive data type or simple class use like List<string>.
Before Commenting about Lazy
Lazy<T> was introduced in .NET 4.0, so please don't add yet another comment regarding this class.
Before Commenting about Micro-Optimizations
When you are building libraries, you must consider all optimizations. For instance, in the .NET classes you will see bit arrays used for Boolean class variables throughout the code to reduce memory consumption and memory fragmentation, just to name two "micro-optimizations".
Regarding User-Interfaces
You are not going to use lazy initialization for classes that are directly used by the user-interface. Last week I spent the better part of a day removing lazy loading of eight collections used in a view-model for combo-boxes. I have a LookupManager that handles lazy loading and caching of collections needed by any user-interface element.
"Setters"
I have never used a set-property ("setters") for any lazy loaded property. Therefore, you would never allow foo.Bar = null;. If you need to set Bar then I would create a method called SetBar(Bar value) and not use lazy-initialization
Collections
Class collection properties are always initialized when declared because they should never be null.
Complex Classes
Let me repeat this differently, you use lazy-initialization for complex classes. Which are usually, poorly designed classes.
Lastly
I never said to do this for all classes or in all cases. It is a bad habit.

Do you consider implementing such pattern using Lazy<T>?
In addition to easy creation of lazy-loaded objects, you get thread safety while the object is initialized:
http://msdn.microsoft.com/en-us/library/dd642331.aspx
As others said, you lazily-load objects if they're really resource-heavy or it takes some time to load them during object construction-time.

I think it depends on what you are initialising. I probably wouldn't do it for a list as the construction cost is quite small, so it can go in the constructor. But if it was a pre-populated list then I probably wouldn't until it was needed for the first time.
Basically, if the cost of construction outweighs the cost of doing an conditional check on each access then lazy create it. If not, do it in the constructor.

Lazy instantiation/initialization is a perfectly viable pattern. Keep in mind, though, that as a general rule consumers of your API do not expect getters and setters to take discernable time from the end user POV (or to fail).

The downside that I can see is that if you want to ask if Bars is null, it would never be, and you would be creating the list there.

I was just going to put a comment on Daniel's answer but I honestly don't think it goes far enough.
Although this is a very good pattern to use in certain situations (for instance, when the object is initialized from the database), it's a HORRIBLE habit to get into.
One of the best things about an object is that it offeres a secure, trusted environment. The very best case is if you make as many fields as possible "Final", filling them all in with the constructor. This makes your class quite bulletproof. Allowing fields to be changed through setters is a little less so, but not terrible. For instance:
class SafeClass
{
String name="";
Integer age=0;
public void setName(String newName)
{
assert(newName != null)
name=newName;
}// follow this pattern for age
...
public String toString() {
String s="Safe Class has name:"+name+" and age:"+age
}
}
With your pattern, the toString method would look like this:
if(name == null)
throw new IllegalStateException("SafeClass got into an illegal state! name is null")
if(age == null)
throw new IllegalStateException("SafeClass got into an illegal state! age is null")
public String toString() {
String s="Safe Class has name:"+name+" and age:"+age
}
Not only this, but you need null checks everywhere you might possibly use that object in your class (Outside your class is safe because of the null check in the getter, but you should be mostly using your classes members inside the class)
Also your class is perpetually in an uncertain state--for instance if you decided to make that class a hibernate class by adding a few annotations, how would you do it?
If you make any decision based on some micro-optomization without requirements and testing, it's almost certainly the wrong decision. In fact, there is a really really good chance that your pattern is actually slowing down the system even under the most ideal of circumstances because the if statement can cause a branch prediction failure on the CPU which will slow things down many many many more times than just assigning a value in the constructor unless the object you are creating is fairly complex or coming from a remote data source.
For an example of the brance prediction problem (which you are incurring repeatedly, nost just once), see the first answer to this awesome question: Why is it faster to process a sorted array than an unsorted array?

Let me just add one more point to many good points made by others...
The debugger will (by default) evaluate the properties when stepping through the code, which could potentially instantiate the Bar sooner than would normally happen by just executing the code. In other words, the mere act of debugging is changing the execution of the program.
This may or may not be a problem (depending on side-effects), but is something to be aware of.

Are you sure Foo should be instantiating anything at all?
To me it seems smelly (though not necessarily wrong) to let Foo instantiate anything at all. Unless it is Foo's express purpose to be a factory, it should not instantiate it's own collaborators, but instead get them injected in its constructor.
If however Foo's purpose of being is to create instances of type Bar, then I don't see anything wrong with doing it lazily.

C# Property Access Optimization

In C# (or VB .NET), does the compiler make attempts to optimize property accesses? For eg.,
public ViewClass View
{
get
{
...
Something is computed here
....
}
}
if (View != null)
View.Something = SomethingElse;
I would imagine that if the compiler could somehow detect that View remains constant between the two accesses, it can refrain from computing the value twice. Are these kind of optimizations performed?
I understand that if View has some intensive computations, it should probably be refactored into a function (GetView()). In my particular case, View involves climbing the visual tree looking for an element of a particular type.
Related: Any references on the workings of the (Microsoft) C# compiler?

Not in general, no. As Steven mentioned there are numerous factors to consider regarding multithreading, if you truly are computing something that might change, you're correct it should be refactored away from a property. If it won't change, you should lazy-load it (check if the private member is null, if so then calculate, then return the value).
If it won't change and depends on a parameter, you can use a Dictionary or Hashtable as a cache - given the parameter (key) you will store the value. You could have each entry as a WeakReference to the value too, so when the value isn't referenced anywhere and garbage collection happens, the memory will be freed.
Hope that helps.

The question is very unclear, it isn't obvious to me how the getter and the snippet below it are related. But yes, property accessors are normally heavily optimized. Not by the C# compiler, by the JIT compiler. For one, they are often inlined so you don't pay for the cost of a method call.
That will only happen if the getter doesn't contain too much code and doesn't monkey with locks and exception handling. You can help the JIT compiler to optimize the common case with code like this:
get
{
if (_something == null) {
_something = createSomething();
}
return _something;
}
This will inline the common case and allow the creation method to remain un-inlined. This gets typically compiled to three machine code instructions in the Release build (load + test + jump), about a nano-second of execution time. It is a micro-optimization, seeing an actual perf improvement would be quite rare.
Do note that the given sample code is not thread-safe. Always write correct code rather than fast code first.

No, which is why you should use Lazy<T> to implement a JIT calculation.

From my understanding there is no implicit caching - you have to cache the value of a given property yourself the first time it is calculated
For example:
object mCachedValue = null;
public Object MyProperty
{
get
{
if (mCachedValue == null)
{
lock(mCachedValue)
{
//after acquiring the lock check if the property has not been initialized in the mean time - only calculate once
if (mCachedValue == null)
{
//calculate value the first time
}
}
}
return mCachedValue;
}

Avoiding array duplication

According to [MSDN: Array usage guidelines](http://msdn.microsoft.com/en-us/library/k2604h5s(VS.71).aspx):
Array Valued Properties
You should use collections to avoid code inefficiencies. In the following code example, each call to the myObj property creates a copy of the array. As a result, 2n+1 copies of the array will be created in the following loop.
[Visual Basic]
Dim i As Integer
For i = 0 To obj.myObj.Count - 1
DoSomething(obj.myObj(i))
Next i
[C#]
for (int i = 0; i < obj.myObj.Count; i++)
DoSomething(obj.myObj[i]);
Other than the change from myObj[] to ICollection myObj, what else would you recommend? Just realized that my current app is leaking memory :(
Thanks;
EDIT: Would forcing C# to pass references w/ ref (safety aside) improve performance and/or memory usage?

No, it isn't leaking memory - it is just making the garbage collector work harder than it might. Actually, the MSDN article is slightly misleading: if the property created a new collection every time it was called, it would be just as bad (memory wise) as with an array. Perhaps worse, due to the usual over-sizing of most collection implementations.
If you know a method/property does work, you can always minimise the number of calls:
var arr = obj.myObj; // var since I don't know the type!
for (int i = 0; i < arr.Length; i++) {
DoSomething(arr[i]);
}
or even easier, use foreach:
foreach(var value in obj.myObj) {
DoSomething(value);
}
Both approaches only call the property once. The second is clearer IMO.
Other thoughts; name it a method! i.e. obj.SomeMethod() - this sets expectation that it does work, and avoids the undesirable obj.Foo != obj.Foo (which would be the case for arrays).
Finally, Eric Lippert has a good article on this subject.

Just as a hint for those who haven't use the ReadOnlyCollection mentioned in some of the answers:
[C#]
class XY
{
private X[] array;
public ReadOnlyCollection<X> myObj
{
get
{
return Array.AsReadOnly(array);
}
}
}
Hope this might help.

Whenever I have properties that are costly (like recreating a collection on call) I either document the property, stating that each call incurs a cost, or I cache the value as a private field. Property getters that are costly, should be written as methods.
Generally, I try to expose collections as IEnumerable rather than arrays, forcing the consumer to use foreach (or an enumerator).

It will not make copies of the array unless you make it do so. However, simply passing the reference to an array privately owned by an object has some nasty side-effects. Whoever receives the reference is basically free to do whatever he likes with the array, including altering the contents in ways that cannot be controlled by its owner.
One way of preventing unauthorized meddling with the array is to return a copy of the contents. Another (slightly better) is to return a read-only collection.
Still, before doing any of these things you should ask yourself if you are about to give away too much information. In some cases (actually, quite often) it is even better to keep the array private and instead let provide methods that operate on the object owning it.

myobj will not create new item unless you explicitly create one. so to make better memory usage I recommend to use private collection (List or any) and expose indexer which will return the specified value from the private collection

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.