It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).
You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).
There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)
In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.
Related
For a really simple code snippet, I'm trying to see how much of the time is spent actually allocating objects on the small object heap (SOH).
static void Main(string[] args)
{
const int noNumbers = 10000000; // 10 mil
ArrayList numbers = new ArrayList();
Random random = new Random(1); // use the same seed as to make
// benchmarking consistent
for (int i = 0; i < noNumbers; i++)
{
int currentNumber = random.Next(10); // generate a non-negative
// random number less than 10
object o = currentNumber; // BOXING occurs here
numbers.Add(o);
}
}
In particular, I want to know how much time is spent allocating space for the all the boxed int instances on the heap (I know, this is an ArrayList and there's horrible boxing going on as well - but it's just for educational purposes).
The CLR has 2 ways of performing memory allocations on the SOH: either calling the JIT_TrialAllocSFastMP (for multi-processor systems, ...SFastSP for single processor ones) allocation helper - which is really fast since it consists of a few assembly instructions - or failing back to the slower JIT_New allocation helper.
PerfView sees just fine the JIT_New being invoked:
However, I can't figure out which - if any - is the native function involved for the "quick way" of allocating. I certainly don't see any JIT_TrialAllocSFastMP. I've already tried raising the count of the loop (from 10 to 500 mil), in the hope of increasing my chances of of getting a glimpse of a few stacks containing the elusive function, but to no avail.
Another approach was to use JetBrains dotTrace (line-by-line) performance viewer, but it falls short of what I want: I do get to see the approximate time it takes the boxing operation for each int, but 1) it's just a bar and 2) there's both the allocation itself and the copying of the value (of which the latter is not what I'm after).
Using the JetBrains dotTrace Timeline viewer won't work either, since they currently don't (quite) support native callstacks.
At this point it's unclear to me if there's a method being dynamically generated and called when JIT_TrialAllocSFastMP is invoked - and by miracle neither of the PerfView-collected stack frames (one every 1 ms) ever capture it -, or somehow the Main's method body gets patched, and those few assembly instructions mentioned above are somehow injected directly in the code. It's also hard to believe that the fast way of allocating memory is never called.
You could ask "But you already have the .NET Core CLR code, why can't you figure out yourself ?". Since the .NET Framework CLR code is not publicly available, I've looked into its sibling, the .NET Core version of the CLR (as Matt Warren recommends in his step 6 here). The \src\vm\amd64\JitHelpers_InlineGetThread.asm file contains a JIT_TrialAllocSFastMP_InlineGetThread function. The issue is that parsing/understanding the C++ code there is above my grade, and also I can't really think of a way to "Step Into" and see how the JIT-ed code is generated, since this is way lower-level that your usual Press-F11-in-Visual-Studio.
Update 1: Let's simplify the code, and only consider individual boxed int values:
const int noNumbers = 10000000; // 10 mil
object o = null;
for (int i=0;i<noNumbers;i++)
{
o = i;
}
Since this is a Release build, and dead code elimination could kick in, WinDbg is used to check the final machine code.
The resulting JITed code, whose main loop is highlighted in blue below, which simply does repeated boxing, shows that the method that handles the memory allocation is not inlined (note the call to hex address 00af30f4):
This method in turn tries to allocate via the "fast" way, and if that fails, goes back to the "slow" way of a call to JIT_New itself):
It's interesting how the call stack in PerfView obtained from the code above doesn't show any intermediary method between the level of Main and the JIT_New entry itself (given that Main doesn't directly call JIT_New):
Why does incrementing an (in my case) Uint by one 100.000.000 times take ~0.175 seconds, while incrementing an Uint within a struct the same amount of times takes ~1.21 seconds?
The tests have been conducted roughly 10 times with nearly the same results in time. If it can't be helped, so be it. But I would like to know what causes this. The time increase is rather significant. The operator overload below is the exact code used:
private uint _num;
public static Seq operator ++(Seq a)
{
a._num++; return a;
}
I chose to edit the instance itself (if this goes against guidelines) rather than returning a new instance because this also takes quite a while longer.
This struct will be incremented very frequently, thus I'm looking for the reason to this increased processing time.
It's simply a matter of how smart the jitter is. For a regular local int variable, the statement
x++;
can in many cases be reduced to a single machine instruction because the local variable could be enregistered. If it is not enregistered then the sequence of instructions will be to load the value, increment it, and store it, so a handful of instructions.
But overloading ++ on a struct has the following semantics. Suppose we have our struct in s and we say s++. That means effectively that we implement
s = S.operator++(s);
What does that do?
Make a copy from s to the local variable location that is the new formal parameter
Stores away any register state that is going to be overwritten by the callee
execute the call instruction
load the value of the formal, increment it, store it
copy the new value to the location reserved for the return value
execute the return instruction
restore the state of the previous activation frame
copy the returned value to the location for s.
So your fast program is doing step 4. Your slow program is doing steps 1 through 8, and is about eight times slower. The jitter could identify that this is a candidate for inlining and get rid of some of those costs, but it is by no means required to, and there are plenty of reasons why it might choose not to inline. The jitter does not know that this increment is important to you.
I think this is because your Seq is a struct (value-type) and a way increment operator works. As you can see public static Seq operator ++(Seq a) { ... } is returning an instance of your Seq struct. However, as structs are passed by value, it actually creates a new instance of Seq which is returned and here is your overhead.
Take a look at another example:
struct SeqStruct
{
private uint _num;
public void Increment() => _num++;
}
// ----------------------------------
var seq = new SeqStruct();
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100000000; i++)
seq.Increment();
s3.Stop();
Now, if you measure a time of invokation of Increment() method you may see that it is now closer to "pure" uint incrementation, and if you switch to Release build configuration, you will have same time as "pure" uint incrementation (this method was "inlined").
Another option is to use class instead of struct:
class SeqClass
{
private uint _num;
public static SeqClass operator ++(SeqClass a) { a._num++; return a; }
}
Now incrementation will do faster too.
First off, that is the kind of question that calls for linking the performance rant: https://ericlippert.com/2012/12/17/performance-rant/ These kind of questions fall into premature optimisation/does not really mater territory.
As for the reason why: Aside from issues with measuring properly, there is at least a function call Overhead. It does not mater how far you are removed from naked pointers, internally there is still two jump and a add/retrieve from the function stack happening.
Or at least it happens most of the time. The thing is that the JiT could go and inline that function call. It might even partially recompile to make this change. It is really hard to predict if and when it does that.
I'm writing an AI for my puzzle game and I'm facing the following situation:
Currently, I have a class, Move, which represents a move in my game, which has similiar logic to chess.
In the Move class, I'm storing the following data:
The move player color.
The moving piece.
The origin position on the board.
The destination position on the board.
The piece that has been killed by performing this move (if any).
The move score.
In addition, I got some methods which describes amove, such as IsResigned, Undo etc.
This move instance is being passed along in my AI, which is based on the Alpha Beta algorithm. Therfore, the move instance is being passed many times, and I'm constructing many many Move class instances along the AI implementation. Thus, I'm afriad that it may have significant inpact of the performance.
To reduce the performance inpact, I thought about the following solution:
Instead of using instances of the Move class, I'll store my entire move data inside a long number (using bitwise operations), and then will extract the information as needed.
For instance:
- Player color will be from bit 1 - 2 (1 bit).
- Oirign position will be from bit 2 - 12 (10 bits).
and so on.
See this example:
public long GenerateMove(PlayerColor color, int origin, int destination) {
return ((int)color) | (origin << 10) | (destination << 20);
}
public PlayerColor GetColor(long move) {
return move & 0x1;
}
public int GetOrigin(long move) {
return (int)((move >> 10) & 0x3f);
}
public int GetDestination(long move) {
return (int)((move >> 20) & 0x3f);
}
Using this method, I can pass along the AI just long numbers, instead of class instances.
However, I got some wonders: Put aside the added complexity to the program, class instances are being passed in C# by reference (i.e. by sending a pointer to that address). So does my alternative method even make sense? The case is even worse, since I'm using long numbers here (64bis), but the pointer address may be represented as an integer (32bits) - so it may even have worest performance than my current implementation.
What is your opinion about this alternative method?
There are a couple of things to say here:
Are you actually having performance problems (and are you sure memory usage is the reason)? Memory allocation for new instances is very cheap in .net and normally, you will not notice garbage collection. So you might be barking up the wrong tree here.
When you pass an instance of a reference type, you are just passing a reference; when you store a reference type (e.g. in an array), you will just store the reference. So unless you create a lot of distinct instances or copy the data into new instances, passing the reference does not increase heap size. So passing references might be the most efficient way to go.
If you create a lot of copies and discard them quickly and you are afraid of memory impact (again, do you face actual problems?), you can create value types (structinstead of class). But you have to be aware of the value type semantics (you are always working on copies).
You can not rely on a reference being 32 bit. On a 64 bit system, it will be 64 bit.
I would strongly advise against storing the data in an integer variable. It makes your code less maintainable and that is in most of the cases not worth the performance tradeoff. Unless you are in severe trouble, don't do it.
If you don't want to give up on the idea of using a numeric value, use at least a struct, that is composed of two System.Collections.Specialized.BitVector32 instances. This is a built in .NET type that will do the mask and shift operations for you. In that struct you can also encapsulate accessing the values in properties, so you can keep this rather unusual way of storing your values away from your other code.
UPDATE:
I would recommend you use a profiler to see where the performance problems are. It is almost impossible (and defenitely not a good use of your time) to use guesswork for performance optimization. Once you see the profiler results, you'll probably be surprised about the reason of your problems. I would bet that memory usage or memory allocation is not it.
In case you actually come to the conclusion that memory consumption of your Move instances is the reason and using small value types would solve the problem (I'd be surprised), don't use an Int64, use a custom struct (as described in 6.) like the following, that will be the same size as an Int64:
[System.Runtime.InteropServices.StructLayout( System.Runtime.InteropServices.LayoutKind.Sequential, Pack = 4 )]
public struct Move {
private static readonly BitVector32.Section SEC_COLOR = BitVector32.CreateSection( 1 );
private static readonly BitVector32.Section SEC_ORIGIN = BitVector32.CreateSection( 63, SEC_COLOR );
private static readonly BitVector32.Section SEC_DESTINATION = BitVector32.CreateSection( 63, SEC_ORIGIN );
private BitVector32 low;
private BitVector32 high;
public PlayerColor Color {
get {
return (PlayerColor)low[ SEC_COLOR ];
}
set {
low[ SEC_COLOR ] = (int)value;
}
}
public int Origin {
get {
return low[ SEC_ORIGIN ];
}
set {
low[ SEC_ORIGIN ] = value;
}
}
public int Destination {
get {
return low[ SEC_DESTINATION ];
}
set {
low[ SEC_DESTINATION ] = value;
}
}
}
But be aware that you are now using a value type, so you have to use it accordingly. That means assignments create copies of the original (i.e. changing the destination value will leave the source unchanged), using ref parameters if you want to persist changes made by subroutines and avoid boxing at any cost to prevent even worse performance (some operations can mean boxing even though you won't immediately notice, e.g. passing the struct that implements an interface as an argument of the interface type). Using structs (just as well as using Int64) will only be worth it when you create a lot of temporary values, which you quickly throw away. And then you'll still need to confirm with a profile that your performance is actually improved.
Recently, in this question, I've asked how to get a raw memory address of class in C# (it is a crude unreliable hack and a bad practice, don't use it unless you really need it). I've succeeded, but then a problem arose: according to this article, first 2 words in the class raw memory representation should be pointers to SyncBlock and RTTI structures, and therefore the first field's address must be offset by 2 words [8 bytes in 32-bit systems, 16 bytes in 64-bit systems] from the beginning. However, when I dump first bytes from memory at the object location, first field's raw offset from the object's address is only 1 32-bit word (4 bytes), which doesn't make any sense for both types of systems. From the question I've linked:
class Program
{
// Here is the function.
// I suggest looking at the original question's solution, as it is
// more reliable.
static IntPtr getPointerToObject(Object unmanagedObject)
{
GCHandle gcHandle = GCHandle.Alloc(unmanagedObject, GCHandleType.WeakTrackResurrection);
IntPtr thePointer = Marshal.ReadIntPtr(GCHandle.ToIntPtr(gcHandle));
gcHandle.Free();
return thePointer;
}
class TestClass
{
uint a = 0xDEADBEEF;
}
static void Main(string[] args)
{
byte[] cls = new byte[16];
var test = new TestClass();
var thePointer = getPointerToObject(test);
Marshal.Copy(thePointer, cls, 0, 16); //Dump first 16 bytes...
Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(thePointer.ToInt32())));
Console.WriteLine(BitConverter.ToString(cls));
Console.ReadLine();
gcHandle.Free();
}
}
/* Example output (yours should be different):
40-23-CA-02
4C-38-04-01-EF-BE-AD-DE-00-00-00-80-B4-21-50-73
That field's value is "EF-BE-AD-DE", 0xDEADBEEF as it is stored in memory. Yay, we found it!
*/
Why is so? Maybe I've just got the address wrong, but how and why? And if I didn't, what could be wrong anyway? Maybe, if that article is wrong, I simply misunderstood what managed class header looks like? Or maybe it doesn't have that Lock pointer - but why and how is this possible?..
(These are, obviously, only a few possible options, and, while I'm still going to carefully check each one I can predict, wild guessing cannot compare in both time and accuracy to a correct answer.)
#HansPassant brilliantly pointed out that the pointer for the object in question points to the second structure, the method table. Now that totally makes sense for performance reasons, as the method table (RTTI structure) is used far more often than the SyncRoot structure, which, therefore, is still located right before it at the negative index -1.
He made it clear that he doesn't want to post this answer so I'm posting it myself, but the credit still goes to him.
But I would like to remind that this is a dirty unreliable hack, possibly making the system unstable:
Beyond the pinning problem, other nasty issues are not having any idea how long the object is and how the fields are arranged.
You should use the debugger instead, unless you understand all the consequences, understand exactly what you are trying to do and really need to do it - using this, dirty and unreliable, way.
I am new to the programming in C#.
Can anyone please tell me memory management about C#?
Class Student
{
int Id;
String Name;
Double Marks;
public string getStudentName()
{
return this.Name;
}
public double getPersantage()
{
return this.Marks * 100 / 500;
}
}
I want to know how much memory is allocated for instance of this class?
What about methods? Where they are allocated?
And if there are static methods, what about their storage?
Can anyone please briefly explain this to me?
An instance of the class itself will take up 24 bytes on a 32-bit CLR:
8 bytes of object overhead (sync block and type pointer)
4 bytes for the int
4 bytes for the string reference
8 bytes for the double
Note that the memory for the string itself is in addition to that - but many objects could share references to the same string, for example.
Methods don't incur the same sort of storage penalty is fields. Essentially they're associated with the type rather than an instance of the type, but there's the IL version and the JIT-compiled code to consider. However, usually you can ignore this in my experience. You'd have to have a large amount of code and very few instances for the memory taken up by the code to be significant compared with the data. The important thing is that you don't get a separate copy of each method for each instance.
EDIT: Note that you happened to pick a relatively easy case. In situations where you've got fields of logically smaller sizes (e.g. short or byte fields) the CLR chooses how to lay out the object in memory, such that values which require memory alignment (being on a word boundary) are laid out appropriately, but possibly backing other ones - so 4 byte fields could end up taking 4 bytes, or they could take 16 if the CLR decides to align each of them separately. I think that's implementation-specific, but it's possible that the CLI spec dictates the exact approach taken.
As, I think Jon Skeet is saying, it depends on a lot of factors, and not easily measurable ahead of time. Factors such as whether it's running on a 64 bit OS or 32 bit OS must be taken into account, and whether you are running a debug or release version come into play. The amount of memory taken up by code depends on the processor that the JITTER compiles to, as different optimizations can be used for different processors.
Not really answer, just for fun.
struct Student
{
int Id;
[MarshalAs(UnmanagedType.LPStr)]
String Name;
Double Marks;
public string getStudentName()
{
return this.Name;
}
public double getPersantage()
{
return this.Marks * 100 / 500;
}
}
And
Console.WriteLine(Marshal.SizeOf(typeof(Student)));
On 64bit return:
24
And on 32 bit:
16
sizeof(getPersantage());
a good way to find out bytes for it. not too havent done much C#, but better with an answer than no answer :=)