Why does overloading ++ take significantly longer than incrementing just the value?

Why does overloading ++ take significantly longer than incrementing just the value? - c#

Why does incrementing an (in my case) Uint by one 100.000.000 times take ~0.175 seconds, while incrementing an Uint within a struct the same amount of times takes ~1.21 seconds?
The tests have been conducted roughly 10 times with nearly the same results in time. If it can't be helped, so be it. But I would like to know what causes this. The time increase is rather significant. The operator overload below is the exact code used:
private uint _num;
public static Seq operator ++(Seq a)
{
a._num++; return a;
}
I chose to edit the instance itself (if this goes against guidelines) rather than returning a new instance because this also takes quite a while longer.
This struct will be incremented very frequently, thus I'm looking for the reason to this increased processing time.

It's simply a matter of how smart the jitter is. For a regular local int variable, the statement
x++;
can in many cases be reduced to a single machine instruction because the local variable could be enregistered. If it is not enregistered then the sequence of instructions will be to load the value, increment it, and store it, so a handful of instructions.
But overloading ++ on a struct has the following semantics. Suppose we have our struct in s and we say s++. That means effectively that we implement
s = S.operator++(s);
What does that do?
Make a copy from s to the local variable location that is the new formal parameter
Stores away any register state that is going to be overwritten by the callee
execute the call instruction
load the value of the formal, increment it, store it
copy the new value to the location reserved for the return value
execute the return instruction
restore the state of the previous activation frame
copy the returned value to the location for s.
So your fast program is doing step 4. Your slow program is doing steps 1 through 8, and is about eight times slower. The jitter could identify that this is a candidate for inlining and get rid of some of those costs, but it is by no means required to, and there are plenty of reasons why it might choose not to inline. The jitter does not know that this increment is important to you.

I think this is because your Seq is a struct (value-type) and a way increment operator works. As you can see public static Seq operator ++(Seq a) { ... } is returning an instance of your Seq struct. However, as structs are passed by value, it actually creates a new instance of Seq which is returned and here is your overhead.
Take a look at another example:
struct SeqStruct
{
private uint _num;
public void Increment() => _num++;
}
// ----------------------------------
var seq = new SeqStruct();
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100000000; i++)
seq.Increment();
s3.Stop();
Now, if you measure a time of invokation of Increment() method you may see that it is now closer to "pure" uint incrementation, and if you switch to Release build configuration, you will have same time as "pure" uint incrementation (this method was "inlined").
Another option is to use class instead of struct:
class SeqClass
{
private uint _num;
public static SeqClass operator ++(SeqClass a) { a._num++; return a; }
}
Now incrementation will do faster too.

First off, that is the kind of question that calls for linking the performance rant: https://ericlippert.com/2012/12/17/performance-rant/ These kind of questions fall into premature optimisation/does not really mater territory.
As for the reason why: Aside from issues with measuring properly, there is at least a function call Overhead. It does not mater how far you are removed from naked pointers, internally there is still two jump and a add/retrieve from the function stack happening.
Or at least it happens most of the time. The thing is that the JiT could go and inline that function call. It might even partially recompile to make this change. It is really hard to predict if and when it does that.

Related

Memory Allocation Time (The Fast Way)

For a really simple code snippet, I'm trying to see how much of the time is spent actually allocating objects on the small object heap (SOH).
static void Main(string[] args)
{
const int noNumbers = 10000000; // 10 mil
ArrayList numbers = new ArrayList();
Random random = new Random(1); // use the same seed as to make
// benchmarking consistent
for (int i = 0; i < noNumbers; i++)
{
int currentNumber = random.Next(10); // generate a non-negative
// random number less than 10
object o = currentNumber; // BOXING occurs here
numbers.Add(o);
}
}
In particular, I want to know how much time is spent allocating space for the all the boxed int instances on the heap (I know, this is an ArrayList and there's horrible boxing going on as well - but it's just for educational purposes).
The CLR has 2 ways of performing memory allocations on the SOH: either calling the JIT_TrialAllocSFastMP (for multi-processor systems, ...SFastSP for single processor ones) allocation helper - which is really fast since it consists of a few assembly instructions - or failing back to the slower JIT_New allocation helper.
PerfView sees just fine the JIT_New being invoked:
However, I can't figure out which - if any - is the native function involved for the "quick way" of allocating. I certainly don't see any JIT_TrialAllocSFastMP. I've already tried raising the count of the loop (from 10 to 500 mil), in the hope of increasing my chances of of getting a glimpse of a few stacks containing the elusive function, but to no avail.
Another approach was to use JetBrains dotTrace (line-by-line) performance viewer, but it falls short of what I want: I do get to see the approximate time it takes the boxing operation for each int, but 1) it's just a bar and 2) there's both the allocation itself and the copying of the value (of which the latter is not what I'm after).
Using the JetBrains dotTrace Timeline viewer won't work either, since they currently don't (quite) support native callstacks.
At this point it's unclear to me if there's a method being dynamically generated and called when JIT_TrialAllocSFastMP is invoked - and by miracle neither of the PerfView-collected stack frames (one every 1 ms) ever capture it -, or somehow the Main's method body gets patched, and those few assembly instructions mentioned above are somehow injected directly in the code. It's also hard to believe that the fast way of allocating memory is never called.
You could ask "But you already have the .NET Core CLR code, why can't you figure out yourself ?". Since the .NET Framework CLR code is not publicly available, I've looked into its sibling, the .NET Core version of the CLR (as Matt Warren recommends in his step 6 here). The \src\vm\amd64\JitHelpers_InlineGetThread.asm file contains a JIT_TrialAllocSFastMP_InlineGetThread function. The issue is that parsing/understanding the C++ code there is above my grade, and also I can't really think of a way to "Step Into" and see how the JIT-ed code is generated, since this is way lower-level that your usual Press-F11-in-Visual-Studio.
Update 1: Let's simplify the code, and only consider individual boxed int values:
const int noNumbers = 10000000; // 10 mil
object o = null;
for (int i=0;i<noNumbers;i++)
{
o = i;
}
Since this is a Release build, and dead code elimination could kick in, WinDbg is used to check the final machine code.
The resulting JITed code, whose main loop is highlighted in blue below, which simply does repeated boxing, shows that the method that handles the memory allocation is not inlined (note the call to hex address 00af30f4):
This method in turn tries to allocate via the "fast" way, and if that fails, goes back to the "slow" way of a call to JIT_New itself):
It's interesting how the call stack in PerfView obtained from the code above doesn't show any intermediary method between the level of Main and the JIT_New entry itself (given that Main doesn't directly call JIT_New):

Store data inside long number or class instance for better performance

I'm writing an AI for my puzzle game and I'm facing the following situation:
Currently, I have a class, Move, which represents a move in my game, which has similiar logic to chess.
In the Move class, I'm storing the following data:
The move player color.
The moving piece.
The origin position on the board.
The destination position on the board.
The piece that has been killed by performing this move (if any).
The move score.
In addition, I got some methods which describes amove, such as IsResigned, Undo etc.
This move instance is being passed along in my AI, which is based on the Alpha Beta algorithm. Therfore, the move instance is being passed many times, and I'm constructing many many Move class instances along the AI implementation. Thus, I'm afriad that it may have significant inpact of the performance.
To reduce the performance inpact, I thought about the following solution:
Instead of using instances of the Move class, I'll store my entire move data inside a long number (using bitwise operations), and then will extract the information as needed.
For instance:
- Player color will be from bit 1 - 2 (1 bit).
- Oirign position will be from bit 2 - 12 (10 bits).
and so on.
See this example:
public long GenerateMove(PlayerColor color, int origin, int destination) {
return ((int)color) | (origin << 10) | (destination << 20);
}
public PlayerColor GetColor(long move) {
return move & 0x1;
}
public int GetOrigin(long move) {
return (int)((move >> 10) & 0x3f);
}
public int GetDestination(long move) {
return (int)((move >> 20) & 0x3f);
}
Using this method, I can pass along the AI just long numbers, instead of class instances.
However, I got some wonders: Put aside the added complexity to the program, class instances are being passed in C# by reference (i.e. by sending a pointer to that address). So does my alternative method even make sense? The case is even worse, since I'm using long numbers here (64bis), but the pointer address may be represented as an integer (32bits) - so it may even have worest performance than my current implementation.
What is your opinion about this alternative method?

There are a couple of things to say here:
Are you actually having performance problems (and are you sure memory usage is the reason)? Memory allocation for new instances is very cheap in .net and normally, you will not notice garbage collection. So you might be barking up the wrong tree here.
When you pass an instance of a reference type, you are just passing a reference; when you store a reference type (e.g. in an array), you will just store the reference. So unless you create a lot of distinct instances or copy the data into new instances, passing the reference does not increase heap size. So passing references might be the most efficient way to go.
If you create a lot of copies and discard them quickly and you are afraid of memory impact (again, do you face actual problems?), you can create value types (structinstead of class). But you have to be aware of the value type semantics (you are always working on copies).
You can not rely on a reference being 32 bit. On a 64 bit system, it will be 64 bit.
I would strongly advise against storing the data in an integer variable. It makes your code less maintainable and that is in most of the cases not worth the performance tradeoff. Unless you are in severe trouble, don't do it.
If you don't want to give up on the idea of using a numeric value, use at least a struct, that is composed of two System.Collections.Specialized.BitVector32 instances. This is a built in .NET type that will do the mask and shift operations for you. In that struct you can also encapsulate accessing the values in properties, so you can keep this rather unusual way of storing your values away from your other code.
UPDATE:
I would recommend you use a profiler to see where the performance problems are. It is almost impossible (and defenitely not a good use of your time) to use guesswork for performance optimization. Once you see the profiler results, you'll probably be surprised about the reason of your problems. I would bet that memory usage or memory allocation is not it.
In case you actually come to the conclusion that memory consumption of your Move instances is the reason and using small value types would solve the problem (I'd be surprised), don't use an Int64, use a custom struct (as described in 6.) like the following, that will be the same size as an Int64:
[System.Runtime.InteropServices.StructLayout( System.Runtime.InteropServices.LayoutKind.Sequential, Pack = 4 )]
public struct Move {
private static readonly BitVector32.Section SEC_COLOR = BitVector32.CreateSection( 1 );
private static readonly BitVector32.Section SEC_ORIGIN = BitVector32.CreateSection( 63, SEC_COLOR );
private static readonly BitVector32.Section SEC_DESTINATION = BitVector32.CreateSection( 63, SEC_ORIGIN );
private BitVector32 low;
private BitVector32 high;
public PlayerColor Color {
get {
return (PlayerColor)low[ SEC_COLOR ];
}
set {
low[ SEC_COLOR ] = (int)value;
}
}
public int Origin {
get {
return low[ SEC_ORIGIN ];
}
set {
low[ SEC_ORIGIN ] = value;
}
}
public int Destination {
get {
return low[ SEC_DESTINATION ];
}
set {
low[ SEC_DESTINATION ] = value;
}
}
}
But be aware that you are now using a value type, so you have to use it accordingly. That means assignments create copies of the original (i.e. changing the destination value will leave the source unchanged), using ref parameters if you want to persist changes made by subroutines and avoid boxing at any cost to prevent even worse performance (some operations can mean boxing even though you won't immediately notice, e.g. passing the struct that implements an interface as an argument of the interface type). Using structs (just as well as using Int64) will only be worth it when you create a lot of temporary values, which you quickly throw away. And then you'll still need to confirm with a profile that your performance is actually improved.

Does creating new Processes help me for Traversing a big tree?

Let's think of it as a family tree, a father has kids, those kids have kids, those kids have kids, etc...
So I have a recursive function that gets the father uses Recursion to get the children and for now just print them to debug output window...But at some point ( after one hour of letting it run and printing like 26000 rows) it gives me a StackOverFlowException.
So Am really running out of memory? hmmm? then shouldn't I get an "Out of memory exception"? on other posts I found people were saying if the number of recursive calls are too much, you might still get a SOF exception...
Anyway, my first thought was to break the tree into smaller sub-strees..so I know for a fact that my root father always has these five kids, so Instead of Calling my method one time with root passed to it, I said ok call it five times with Kids of root Passes to it.. It helped I think..but still one of them is so big - 26000 rows when it crashes - and still have this issue.
How about Application Domains and Creating new Processes at run time at some certain level of depth? Does that help?
How about creating my own Stack and using that instead of recursive methods? does that help?
here is also a high-level of my code, please take a look, maybe there is actually something silly wrong with this that causes SOF error:
private void MyLoadMethod(string conceptCKI)
{
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
if (numberofKids == 0)
return;
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
MyLoadMethod(oConcept.ConceptCKI);
}
}

How about creating my own Stack and using that instead of recursive methods? does that help?
Yes!
When you instantiate a Stack<T> this will live on the heap and can grow arbitrarily large (until you run out of addressable memory).
If you use recursion you use the call stack. The call stack is much smaller than the heap. The default is 1 MB of call stack space per thread. Note this can be changed, but it's not advisable.

StackOverflowException is quite different to OutOfMemoryException.
OOME means that there is no memory available to the process at all. This could be upon trying to create a new thread with a new stack, or in trying to create a new object on the heap (and a few other cases).
SOE means that the thread's stack - by default 1M, though it can be set differently in thread creation or if the executable has a different default; hence ASP.NET threads have 256k as a default rather than 1M - was exhausted. This could be upon calling a method, or allocating a local.
When you call a function (method or property), the arguments of the call are placed on the stack, the address the function should return to when it returns are put on the stack, then execution jumps to the function called. Then some locals will be placed on the stack. Some more may be placed on it as the function continues to execute. stackalloc will also explicitly use some stack space where otherwise heap allocation would be used.
Then it calls another function, and the same happens again. Then that function returns, and execution jumps back to the stored return address, and the pointer within the stack moves back up (no need to clean up the values placed on the stack, they're just ignored now) and that space is available again.
If you use up that 1M of space, you get a StackOverflowException. Because 1M (or even 256k) is a large amount of memory for these such use (we don't put really large objects in the stack) the three things that are likely to cause an SOE are:
Someone thought it would be a good idea to optimise by using stackalloc when it wasn't, and they used up that 1M fast.
Someone thought it would be a good idea to optimise by creating a thread with a smaller than usual stack when it wasn't, and they use up that tiny stack.
A recursive (whether directly or through several steps) call falls into an infinite loop.
It wasn't quite infinite, but it was large enough.
You've got case 4. 1 and 2 are quite rare (and you need to be quite deliberate to risk them). Case 3 is by far the most common, and indicates a bug in that the recursion shouldn't be infinite, but a mistake means it is.
Ironically, in this case you should be glad you took the recursive approach rather than iterative - the SOE reveals the bug and where it is, while with an iterative approach you'd probably have an infinite loop bringing everything to a halt, and that can be harder to find.
Now for case 4, we've got two options. In the very very rare cases where we've got just slightly too many calls, we can run it on a thread with a larger stack. This doesn't apply to you.
Instead, you need to change from a recursive approach to an iterative one. Most of the time, this isn't very hard thought it can be fiddly. Instead of calling itself again, the method uses a loop. For example, consider the classic teaching-example of a factorial method:
private static int Fac(int n)
{
return n <= 1 ? 1 : n * Fac(n - 1);
}
Instead of using recursion we loop in the same method:
private static int Fac(int n)
{
int ret = 1;
for(int i = 1; i <= n, ++i)
ret *= i;
return ret;
}
You can see why there's less stack space here. The iterative version will also be faster 99% of the time. Now, imagine we accidentally call Fac(n) in the first, and leave out the ++i in the second - the equivalent bug in each, and it causes an SOE in the first and a program that never stops in the second.
For the sort of code you're talking about, where you keep producing more and more results as you go based on previous results, you can place the results you've got in a data-structure (Queue<T> and Stack<T> both serve well for a lot of cases) so the code becomes something like):
private void MyLoadMethod(string firstConceptCKI)
{
Queue<string> pendingItems = new Queue<string>();
pendingItems.Enqueue(firstConceptCKI);
while(pendingItems.Count != 0)
{
string conceptCKI = pendingItems.Dequeue();
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
pendingItems.Enque(oConcept.ConceptCKI);
}
}
}
(I haven't completely checked this, just added the queuing instead of recursing to the code in your question).
This should then do more or less the same as your code, but iteratively. Hopefully that means it'll work. Note that there is a possible infinite loop in this code if the data you are retrieving has a loop. In that case this code will throw an exception when it fills the queue with far too much stuff to cope. You can either debug the source data, or use a HashSet to avoid enqueuing items that have already been processed.
Edit: Better add how to use a HashSet to catch duplicates. First set up a HashSet, this could just be:
HashSet<string> seen = new HashSet<string>();
Or if the strings are used case-insensitively, you'd be better with:
HashSet<string> seen = new HashSet<string>(StringComparison.InvariantCultureIgnoreCase) // or StringComparison.CurrentCultureIgnoreCase if that's closer to how the string is used in the rest of the code.
Then before you go to use the string (or perhaps before you go to add it to the queue, you have one of the following:
If duplicate strings shouldn't happen:
if(!seen.Add(conceptCKI))
throw new InvalidOperationException("Attempt to use \" + conceptCKI + "\" which was already seen.");
Or if duplicate strings are valid, and we just want to skip performing the second call:
if(!seen.Add(conceptCKI))
continue;//skip rest of loop, and move on to the next one.

I think you have a recursion's ring (infinite recursion), not a really stack overflow error. If you are got more memory for stack - you will get the overflow error too.
For test it:
Declare a global variable for storing a operable objects:
private Dictionary<int,object> _operableIds = new Dictionary<int,object>();
...
private void Start()
{
_operableIds.Clear();
Recurtion(start_id);
}
...
private void Recurtion(int object_id)
{
if(_operableIds.ContainsKey(object_id))
throw new Exception("Have a ring!");
else
_operableIds.Add(object_id, null/*or object*/);
...
Recurtion(other_id)
...
_operableIds.Remove(object_id);
}

In C#, does copying a member variable to a local stack variable improve performance?

I quite often write code that copies member variables to a local stack variable in the belief that it will improve performance by removing the pointer dereference that has to take place whenever accessing member variables.
Is this valid?
For example
public class Manager {
private readonly Constraint[] mConstraints;
public void DoSomethingPossiblyFaster()
{
var constraints = mConstraints;
for (var i = 0; i < constraints.Length; i++)
{
var constraint = constraints[i];
// Do something with it
}
}
public void DoSomethingPossiblySlower()
{
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
// Do something with it
}
}
}
My thinking is that DoSomethingPossiblyFaster is actually faster than DoSomethingPossiblySlower.
I know this is pretty much a micro optimisation, but it would be useful to have a definitive answer.
Edit
Just to add a little bit of background around this. Our application has to process a lot of data coming from telecom networks, and this method is likely to be called about 1 billion times a day for some of our servers. My view is that every little helps, and sometimes all I am trying to do is give the compiler a few hints.

Which is more readable? That should usually be your primary motivating factor. Do you even need to use a for loop instead of foreach?
As mConstraints is readonly I'd potentially expect the JIT compiler to do this for you - but really, what are you doing in the loop? The chances of this being significant are pretty small. I'd almost always pick the second approach simply for readability - and I'd prefer foreach where possible. Whether the JIT compiler optimizes this case will very much depend on the JIT itself - which may vary between versions, architectures, and even how large the method is or other factors. There can be no "definitive" answer here, as it's always possible that an alternative JIT will optimize differently.
If you think you're in a corner case where this really matters, you should benchmark it - thoroughly, with as realistic data as possible. Only then should you change your code away from the most readable form. If you're "quite often" writing code like this, it seems unlikely that you're doing yourself any favours.
Even if the readability difference is relatively small, I'd say it's still present and significant - whereas I'd certainly expect the performance difference to be negligible.

If the compiler/JIT isn't already doing this or a similar optimization for you (this is a big if), then DoSomethingPossiblyFaster should be faster than DoSomethingPossiblySlower. The best way to explain why is to look at a rough translation of the C# code to straight C.
When a non-static member function is called, a hidden pointer to this is passed into the function. You'd have roughly the following, ignoring virtual function dispatch since it's irrelevant to the question (or equivalently making Manager sealed for simplicity):
struct Manager {
Constraint* mConstraints;
int mLength;
}
void DoSomethingPossiblyFaster(Manager* this) {
Constraint* constraints = this->mConstraints;
int length = this->mLength;
for (int i = 0; i < length; i++)
{
Constraint constraint = constraints[i];
// Do something with it
}
}
void DoSomethingPossiblySlower()
{
for (int i = 0; i < this->mLength; i++)
{
Constraint constraint = (this->mConstraints)[i];
// Do something with it
}
}
The difference is that in DoSomethingPossiblyFaster, mConstraints lives on the stack and access only requires one layer of pointer indirection, since it's at a fixed offset from the stack pointer. In DoSomethingPossiblySlower, if the compiler misses the optimization opportunity, there's an extra pointer indirection. The compiler has to read a fixed offset from the stack pointer to access this and then read a fixed offset from this to get mConstraints.
There are two possible optimizations that could negate this hit:
The compiler could do exactly what you did manually and cache mConstraints on the stack.
The compiler could store this in a register so that it doesn't need to fetch it from the stack on every loop iteration before dereferencing it. This means that fetching mConstraints from this or from the stack is basically the same operation: A single dereference of a fixed offset from a pointer that's already in a register.

You know the response you will get, right? "Time it."
There is probably not a definitive answer. First, the compiler might do the optimization for you. Second, even if it doesn't, indirect addressing at the assembly level may not be significantly slower. Third, it depends on the cost of making the local copy, compared to the number of loop iterations. Then there are caching effects to consider.
I love to optimize, but this is one place I would definitely say wait until you have a problem, then experiment. This is a possible optimization that can be added when needed, not one of those optimizations that needs to be planned up front to avoid a massive ripple effect later.
Edit: (towards a definitive answer)
Compiling both functions in release mode and examining the IL with IL Dasm shows that in both places the "PossiblyFaster" function uses the local variable, it has one less instruction
ldloc.0 vs
ldarg.0; ldfld class Constraint[] Manager::mConstraints
Of course, this is still one level removed from the machine code - you don't know what the JIT compiler will do for you. But it is likely that "PossiblyFaster" is marginally faster.
However, I still don't recommend adding the extra variable until you are sure this function is the most expensive thing in your system.

I've profiled this and came up with a bunch of interesting results that are probably only valid for my specific example, but I thought would be worth while noting here.
The fastest is X86 release mode. That runs one iteration of my test in 7.1 seconds, whereas the equivalent X64 code takes 8.6 seconds. This was running 5 iterations, each iteration processing the loop 19.2 million times.
The fastest approach for the loop was:
foreach (var constraint in mConstraints)
{
... do stuff ...
}
The second fastest approach, which massively surprised me was the following
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
... do stuff ...
}
I guess this was because mConstraints was stored in a register for the loop.
This slowed down when I removed the readonly option for mConstraints.
So, my summary from this is that being readable in this situation does give performance as well.

.NET, get memory used to hold struct instance

It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).

You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).

There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)

In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why does overloading ++ take significantly longer than incrementing just the value? - c#

Related

Memory Allocation Time (The Fast Way)

Store data inside long number or class instance for better performance

Does creating new Processes help me for Traversing a big tree?

In C#, does copying a member variable to a local stack variable improve performance?

.NET, get memory used to hold struct instance

Categories

Resources