Does object creation overhead apply to structs?

Does object creation overhead apply to structs? - c#

I'm asking for a project in C#, but I assume this question applies to other languages as well. I've heard that massive object creation and destruction causes massive overhead and performance issues. I'm wondering if I can get around this by simply using structs instead of objects.

"Making struct instead of object" - as you term it (I suppose what you mean by object is class) would most likely be of little help since creating struct instance, due to struct's nature, will require you to refer it by value rather than by by reference - and this may (not always) make your memory use heavier
That being said, what you probably need is Flyweight Design Pattern
From https://sourcemaking.com/design_patterns/flyweight:
Flyweight Design Pattern
Intent
Use sharing to support large numbers of fine-grained objects efficiently.
The Motif GUI strategy of replacing heavy-weight widgets with light-weight gadgets.
Problem
Designing objects down to the lowest levels of system "granularity" provides optimal flexibility, but can be unacceptably expensive in terms of performance and memory usage.
Discussion
The Flyweight pattern describes how to share objects to allow their use at fine granularities without prohibitive cost. Each "flyweight" object is divided into two pieces: the state-dependent (extrinsic) part, and the state-independent (intrinsic) part. Intrinsic state is stored (shared) in the Flyweight object. Extrinsic state is stored or computed by client objects, and passed to the Flyweight when its operations are invoked.

Here is some facts that you must now about struct and class in C# :
A struct in C# is faster to create than a class since it's allocated on the stack and not on the heap
struct is a value type, class is a reference type. So working with a reference type (passing it as parameter, copying it, ...) is much faster than working with the value type. see Difference between struct and class
struct fields are fast to access than class fields since they are allocated on the stack
Here is some facts about how the GC works in .Net :
You can't have control on when the GC is triggered by the CLR, it can interrupts your program at any time (there is some options that you can use to tell the CLR that you are running a sensitive part of the code but it doesn't prevent the GC from running if memory is needed. see GC Latency Modes)
You can't have control on the time that the GC takes to do it's work
When the GC is doing a full collection, it freezes all your program threads (depending on whether you are in gcConcurrent or gcServer mode see gcServer mode ).
Knowing all of that and to be short, if you don't want your program to suffer from the GC work, you have to use reference types for objects that will live the longer in your program, and use value types for objects that will be used for a very short time and in a very localized scope.

Related

Limtations of dynamic objects in C#/Java

I'm basically a C++ guy trying to venture into C#. From the basic tutorial of C#, I happen to find that all objects are created and stored dynamically (also true for Java) and are accessed by references and hence there's no need for copy constructors. There is also no need of bitwise copy when passing objects to a function or returning objects from a function. This makes C# much simpler than C++.
However, I read somewhere that operating on objects exclusively through references imposes limitations on the type of operations that one can perform thus restricting the programmer of complete control. One limitation is that the programmer cannot precisely specify when an object can be destroyed.
Can someone please elaborate on other limitations? (with a sample code if required)

Most of the "limitations" are by design rather than considered a deficiency (you may not agree of course)
You cannot determine/you don't have to worry about
when an object is destroyed
where the object is in memory
how big it is (unless you are tuning the application)
using pointer arithmetic
accessing out side an object
accessing an object with the wrong type
sharing objects between threads is simpler
whether the object is on the stack or the heap. (The stack is being used more and more in Java)
fragmentation of memory (This is not true of all collectors)

Because of Garbage collection done in java we cannot predict when the object will get destroyed but it performs the work of destructor.
If you want to free up some resources then you can use finally block.
try {
} finally{
// dispose resources.
}

Having made a similar transition, the more you look into it, the more you do have to think about C#'s GC behaviour in all but the most straighforward cases. This is especially true when trying to handle unmanaged resources from managed code.
This article highlights a lot of the issues you may be interested in.
Personally I miss a reference counted alternative to IDisposable (more like shared_ptr), but that's probably a hangover from a C++ background.
The more I have to write my own plumbing to support C++ like programming, the more likely it is there is another C# mechanism I've overlooked, or I end up getting frustrated with C#. For example, swap and move are not common idioms in C# as far as I've seen and I miss them: other programmers with a C# background may well disagree about how useful those idioms are.

Local variables vs instance variables

I've been doing a lot of research on C# optimization for a game that I'm building with XNA, and I still don't quite understand whether local variables are instance variables give better performance when constantly being updated and used.
According to http://www.dotnetperls.com/optimization , you should avoid parameters and local variables, meaning instance variables are the best option in terms of performance.
But a while ago, I read on another StackOverflow post (I can't seem to find where it was) that local variables are stored in a part of memory that is far quicker to access, and that every time an instance variable is set, the previous value has to be erased as a tedious extra step before a new value can be assigned.
I know that design-wise, it might break encapsulation to use instance variables in that kind of situation, but I'm strictly curious about performance. Currently in my game, I pass around local variables to 3 out of 7 methods in a class, but I could easily promote the variables to instance variables and be able to entirely avoid parameter passing and local variables.
So which would be better?

Are your variables reference (class, or string) or value (struct) types?
For reference types there's no meaningful difference between passing them as a method argument and holding them on an object instance. In the first case when entering the function the argument will (for functions with a small argument count) end up in a register. In the second case the reference exists as an offset of the data pointed to in memory by 'this'. Either scenario is a quick grab of a memory address and then fetching the associated data out of memory (this is the expensive part).
For value types the above is true for certain types (integers or floats that can fit in your CPU's registers). For those specific things it's probably a little cheaper to pass-by-value vs. extracting them off 'this'. For other value types (DateTime or structs you might make yourself or any struct with multiple members) when the data is going to be too large to pass in through a register so this no longer matters.
It's pretty unlikely, though, that any of this matters for the performance of your application (even a game). Most common .NET performance problems (that are not simply inefficient algorithms) are going to, in some form, come from garbage generation. This can manifest itself through accidental boxing, poor use of string building, or poor object lifetime management (your objects have lifespans that are neither very short nor very long/permanent).

Personally, I wouldn't be looking at this as the culprit for performance issues (unless you are constantly passing large structs). My naive understanding is that GC pressure is the usual consideration with XNA games, so being frugal with your object instances basically.
If the variable is method-local, the value itself or the reference (when a reference type) will be located on the stack. If you promote those to class member variables they will be located in the class's memory on the heap.
Method calls would technically become faster as you are no longer copying references or values on the call (because presumably you can remove the parameters from the method if the method is also local to the class).
I'm not sure about the relative performance, but to me it seems that if you need to persist the value then the value makes some sense being in the class...
To me it seems like any potential gains from doing one in favour of the other is outweighed by the subtle differences between the two - making them roughly equivalent or so small a difference as to not care.
Of course, all this stands to be corrected in the face of hard numbers from performance profiling.

Not passing arguments will be slightly faster, not initialising local objects (if they are objects) will be faster.
What you read in both articles is not contradictory, one is mentioning that passing argument costs time and the other mention that initialising objects (in local objects) can cost time as well.
About allocating new objects, one thing you can do is to reuse objects rather discarding them. For example, some time ago, I had to write some code for traders which would compute the price of a few products in real time in C/C++ and C#. I obtained a major boost of performance by not re-creating the system of equation from scratch but only by merely updating the system incrementally from the previous system.
This avoided allocating memory for the new objects, initialising new objects and often the system would be nearly the same so I would have only to modify tiny bits to update it.
Usually, before to do any optimisation you want to make sure that you are optimising something that will impact significantly the overall performance ?

Comparing shared_ptr to managed language references

Could someone explain to a C++ programmer most important differences between Java (and C# as well) references and shared_ptr (from Boost or from C++0x).
I more or less aware how shared_ptr is implemented. I am curious about differences in the following ares:
1) Performance.
2) Cycling. shared_ptr can be cycled (A and B hold pointers to each other). Is cycling possible in Java?
3) Anything else?
Thank you.

Performance: shared_ptr performs pretty well, but in my experience is slightly less efficient than explicit memory management, mostly because it is reference counted and the reference count has to allocated as well. How well it performs depends on a lot of factors and how well it compares to Java/C# garbage collectors can only be determined on a per use case basis (depends on language implementation among other factors).
Cycling is only possible with weak_ptr, not with two shared_ptrs. Java allows cycling without further ado; its garbage collector will break the cycles. My guess is that C# does the same.
Anything else: the object pointed to by a shared_ptr is destroyed as soon as the last reference to it goes out of scope. The destructor is called immediately. In Java, the finalizer may not be called immediately. I don't know how C# behaves on this point.

The key difference is that when the shared pointer's use count goes to zero, the object it points to is destroyed (destructor is called and object is deallocated), immediately. In Java and C# the deallocation of the object is postponed until the Garbage Collector chooses to deallocate the object (i.e., it is non-deterministic).
With regard to cycles, I am not sure I understand what you mean. It is quite common in Java and C# to have two objects that contain member fields that refer to each other, thus creating a cycle. For example a car and an engine - the car refers to the engine via an engine field and the engine can refer to its car via a car field.

Nobody pointed the possibility of moving the object by the memory manager in managed memory. So in C# there are no simple references/pointers, they work like IDs describing object which is returned by the manager.
In C++ you can't achieve this with shared_ptr, because the object stays in the same location after it has been created.

First of all, Java/C# have only pointers, not references, though they call them that way. Reference is a unique C++ feature. Garbage collection in Java/C# basically means infinite life-time. shared_ptr on the other hand provides sharing and deterministic destruction, when the count goes to zero. Therefore, shared_ptr can be used to automatically manage any resources, not just memory allocation. In a sense (just like any RAII design) it turns pointer semantics into more powerful value semantics.

Cyclical references with C++ reference-counted pointers will not be disposed. You can use weak pointers to work around this. Cyclical references in Java or C# may be disposed, when the garbage collector feels like it.
When the count in a C++ reference-counted pointer drops to zero, the destructor is called. When a Java object is no longer reachable, its finalizer may not be called promptly or ever. Therefore, for objects which require explicit disposal of external resources, some form of explicit call is required.

Has anyone written or knows of a custom memory manager written in C#?

We have certain application written in C# and we would like it to stay this way. The application manipulates many small and short lived chunks of dynamic memory. It also appears to be sensitive to GC interruptions.
We think that one way to reduce GC is to allocate 100K chunks and then allocate memory from them using a custom memory manager. Has anyone encountered custom memory manager implementations in C#?

Perhaps you should consider using some sort of pooling architecture, where you preallocate a number of items up front then lease them from the pool. This keeps the memory requirements nicely pinned. There are a few implementations on MSDN that might serve as reference:
http://msdn2.microsoft.com/en-us/library/bb517542.aspx
http://msdn.microsoft.com/en-us/library/system.net.sockets.socketasynceventargs.socketasynceventargs.aspx
...or I can offer my generic implementation if required.

Memory management of all types that descend from System.Object is performed by the garbage collector (with the exception of structures/primitives stored on the stack). In Microsoft's implementation of the CLR, the garbage collector cannot be replaced.
You can allocate some primitives on the heap manually inside of an unsafe block and then access them via pointers, but it's not recommended.
Likely, you should profile and migrate classes to structures accordingly.

The obvious option is using Marshal.AllocHGlobal and Marshal.FreeHGlobal. I also have a copy of the DougLeaAllocator (dlmalloc) written and battle-tested in C#. If you want, I can get that out to you. Either way will require careful, consistent usage of IDisposable.

The only time items are collected in garbage collection is when there are no more references to the object.
You should make a static class or something to keep a lifetime reference to the object for the life of the application.
If you want to manage your own memory it is possible using unsafe in C#, but you would be better to choose a language that wasn't managed like C++.

Although I don't have experience with it, you can try to write C# unmanaged code.
Or maybe, you can tell GC to not collect your objects by calling
GC.KeepAlive(obj);

Should a programmer really care about how many and/or how often objects are created in .NET?

This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.

Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.

This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.

I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.

Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.

I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.

Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.

.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.

You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.

I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.

I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).

I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/

Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.