c# assessing the size of an instance - c#

I know there are some questions about it but I want to consider it from a diffrent perspective.
My result after searching for this question was that there cant be an exact way to weight an instance by bit. But isnt there a way to approach a realistic size based on storage information?
For instance:
class Test
{
// Fields
private int high = 7;
private int low = 5;
private int GetHigh()
{
return High; // Access private field
}
private int GetLow()
{
return Low; // Access private field
}
public float Average ()
{
return (Gethigh() + Getlow()) / 2; // Access private methods
}
}
Test a = new Test();
Test b = new Test();
a.Average();
So my actual question is: can we asses the size of instance a (and b) in certain ways?
I mean we could simply calculate the fields:
2 int fields = 64 Bit
But what about metadata?
How / or can we actually weight the methods since they are part of the Class type object.
And methods are shared by their instances only differentiated by their this pointer.
Can somebody please help me to get a better understanding of this problem. To make it clear, its not about getting an exact value of bits or a method that can calculate this by manipulating VS. Its more about getting a better understanding of instances saved in memory.

Related

C# store and read large array of objects

I have an application (Winforms C#) to perform calculations on a raster. The calculation results are stored as objects in an array, total array length depending on project but currently around 1 million entries (but I want to make them larger, even 2 or 3 million). The goal of the application is perform queries to the data: the users (de)selects some properties, then the app is iterating over the array and summarize the values of the objects for each array entry. The results are shown as a picture (each pixel is an array entry).
Currently I'm storing the data as a compressed JSON string on the disk, so I'm loading all the data in memory. Advantage of doing this is that the queries are performed very fast (max 2 seconds). But disadvantage is that it takes a lot of memory, and it will give a out of memory exception if the array will become larger (I'm already building the app to 64 bit).
Question: is there a way of storing my array on the disk, without loading the entire array in memory and performing the queries in a very fast way? I've done some tests with LiteDB, but executing the queries is not fast enough (but I haven't experience with LiteDB, so maybe I'm doing something wrong). Is a database like LiteDB a good solution? Or is loading all the data in memory the only option?
Update: each entry in my array is a List of class CellResultPart, with around 1 to 10 objects in the list. Class defintion as followes:
public struct CellResultPart
{
public CellResultPart(double designElevation, double existingElevation)
{
DesignElevation = designElevation;
ExistingElevation = existingElevation;
MaterialName = "<None>";
Location = "<None>";
EnvironmentalClass = "<None>";
ElevationTop = double.NaN;
ElevationBottom = double.NaN;
ElevationLayerTop = double.NaN;
ElevationLayerBottom = double.NaN;
DepthLayerTop = double.NaN;
DepthLayerBottom = double.NaN;
DesignElevation = double.NaN;
ExistingElevation = double.NaN;
}
public double DesignElevation;
public double ExistingElevation;
public double Depth
{
get
{
if (IsExcavation)
{
return -Math.Round(Math.Abs(DepthBottom - DepthTop),3);
}
else
{
return Math.Round(Math.Abs(DepthBottom - DepthTop),3);
}
}
}
public double ElevationTop;
public double ElevationBottom;
public double ElevationLayerTop;
public double ElevationLayerBottom;
public double DepthTop
{
get
{
if (IsExcavation)
{
return -Math.Round(Math.Abs(ExistingElevation - ElevationTop),3);
}
else
{
return Math.Round(Math.Abs(DesignElevation - ElevationTop),3);
}
}
}
public double DepthBottom
{
get
{
if (IsExcavation)
{
return -Math.Round(Math.Abs(ExistingElevation - ElevationBottom),3);
}
else
{
return Math.Round(Math.Abs(DesignElevation - ElevationBottom),3);
}
}
}
public double DepthLayerTop;
public double DepthLayerBottom;
public string EnvironmentalClass;
public string Location;
public string MaterialName;
public bool IsExcavation
{
get
{
if (DesignElevation > ExistingElevation)
{
return false;
}
else return true;
}
}
}
Lets make some rough calculations. You have 10 doubles and 3 strings. Lets assume the strings are on average 20 characters. That should give you about 200 bytes per entry, or 200-600Mb overall. That should not be unfeasible to keep in memory, even on a 32 bit system.
Using json will probably not help, since it will make the data much larger. I would consider some binary format that should be closer to the theoretical required size. I have used protobuf .net with good results. That also support SerializeWithLengthPrefix, that should allow you to serialize each object independently from each other in a single stream, and that should avoid the need to keep everything in memory at the same time.
The other option would be to use some kind of database. Such a solution would most likely scale better as the size increase. Database performance is mostly an issue with using appropriate indices, I assume that is the reason your attempt went poorly. Creating good indices may be difficult if you have no idea what queries will be run, but I would still expect a database to perform better than a linear search.

Manage hundreds of classes without creating and destroying them?

I have a class A that works with hundreds or thousands of classes, each class has a method with some calculations, for example.
Class A has a method where it choose which class, of those hundreds or thousands, runs. And the method of class A runs many times in a short time.
The solution that I thought at the beginning was to have the classes already created in class A, to avoid having to create and destroy classes every time the event was executed and that the garbage collector consumes CPU. But this class A, as I say, is going to have hundreds or thousands of classes to run and having them all loaded is too high an expense in memory (I think).
My question is, can you think of an optimal way to work with hundreds or thousands of classes, which will run some of them every second, without having to create and destroy it in each execution of the method that works with them?
Edit:
First example: Create and save the classes and then use them, I think it would be a memory expense. But keep the garbage collector from working too much.
public class ClassA {
Class1 class1;
Class2 class2;
// ... more classes
Class100 class100;
public ClassA() {
class1 = new Class1();
// ... ‎initializations
class100 = new Class100();
}
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
class1.calculate();
break;
case 2:
class2.run();
break;
// ... more cases, one for each class
case 100:
class100.method();
break;
default:
break;
}
}
}
Second example: Creating the class when used, saves memory but the garbage collector consumes a lot of CPU.
public class ClassA {
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
Class1 class1 = new Class1();
class1.calculate();
break;
case 2:
Class2 Class2 = new Class2();
class2.run();
break;
// ... more cases, one for each class
case 100:
Class100 class100 = new Class100();
class100.method();
break;
default:
break;
}
}
}
The basic problem you face, when you start increasing the number of class instances is that they all need to be accounted and tracked during garbage collection operation, even if you never free those instances, the garbage collector still needs to track them. There comes a point when the program spends more time performing garbage collection than actual work. We experienced this kind of performance problem with a binary search tree that ended up containing several millions of nodes that originally were class instances.
We were able to circumvent this by using List<T> of structs rather than classes. (The memory of a list is backed by an array, and for structs, the garbage collector only needs to track a single reference to this array). Now, instead of references to a class, we store indices to this list in order to access a desired instance of the struct.
In fact we also faced the problem (notice newer versions of the .NET framework do away with this limitation) that the backing array couldn't grow beyond 2GB even under 64-bits, so we split storage on several lists (256) and used a 32 bit index where 8 bits acted as a list selector and the remaining 24 bits served as an index into the list.
Of course it is convenient to build a class that abstracts all these details, and you need to be aware that when modifying the struct, you actually need to copy it to a local instance, modify it and then replace the original struct with a copy of the modified instance, otherwise your changes will occur in a temporal copy of the struct and not be reflected on your data collection. Also, there is a performance impact, that fortunately is paid-back once the collection is large enough, with extremely fast garbage collection cycles.
Here is some code (quite old), showing these ideas in place, and went from a server spending near 100% of CPU time, to around 15%, just by migrating our search tree to this approach.
public class SplitList<T> where T : struct {
// A virtual list divided into several sublists, removing the 2GB capacity limit
private List<T>[] _lists;
private Queue<int> _free = new Queue<int>();
private int _maxId = 0;
private const int _hashingBits = 8;
private const int _listSelector = 32 - _hashingBits;
private const int _subIndexMask = (1 << _listSelector) - 1;
public SplitList() {
int listCount = 1 << _hashingBits;
_lists = new List<T>[listCount];
for( int i = 0; i < listCount; i++ )
_lists[i] = new List<T>();
}
// Access a struct by index
// Remember that this returns a local copy of the struct, so if changes are to be done,
// the local copy must be copied to a local struct, modify it, and then copy back the changes
// to the list
public T this[int idx] {
get {
return _lists[(idx >> _listSelector)][idx & _subIndexMask];
}
set {
_lists[idx >> _listSelector][idx & _subIndexMask] = value ;
}
}
// returns an index to a "new" struct inside the collection
public int New() {
int result;
T newElement = new T();
// are there any free indexes available?
if( _free.Count > 0 ) {
// yes, return a free index and initialize reused struct to default values
result = _free.Dequeue();
this[result] = newElement;
} else {
// no, grow the capacity
result = ++_maxId;
List<T> list = _lists[result >> _listSelector];
list.Add(newElement);
}
return result;
}
// free an index and allow the struct slot to be reused.
public void Free(int idx) {
_free.Enqueue(idx);
}
}
Here is a snippet of how our binary tree implementation ended up looking using this SplitList backing container class:
public class CLookupTree {
public struct TreeNode {
public int HashValue;
public int LeftIdx;
public int RightIdx;
public int firstSpotIdx;
}
SplitList<TreeNode> _nodes;
…
private int RotateLeft(int idx) {
// Performs a tree rotation to the left, here you can see how we need
// to retrieve the struct to a local copy (thisNode), modify it, and
// push back the modifications to the node storage list
// Also note that we are working with indexes rather than references to
// the nodes
TreeNode thisNode = _nodes[idx];
int result = thisNode.RightIdx;
TreeNode rightNode = _nodes[result];
thisNode.RightIdx = rightNode.LeftIdx;
rightNode.LeftIdx = idx;
_nodes[idx] = thisNode;
_nodes[result] = rightNode;
return result;
}
}

Can this be simplified with Lambda/Block initialization?

Been bouncing back and forth between Swift and C# and I'm not sure if I'm forgetting certain things, or if C# just doesn't easily support what I'm after.
Consider this code which calculates the initial value for Foo:
// Note: This is a field on an object, not a local variable.
int Foo = CalculateInitialFoo();
static int CalculateInitialFoo() {
int x = 0;
// Perform calculations to get x
return x;
}
Is there any way to do something like this without the need to create the separate one-time-use function and instead use an instantly-executing lambda/block/whatever?
In Swift, it's simple. You use a closure (the curly-braces) that you instantly execute (open and closed parentheses), like this:
int Foo = {
int x = 0
// Perform calculations to get x
return x
}()
It's clear, concise and doesn't clutter up the object's interface with functions just to initialize fields.
Note: To be clear, I do NOT want a calculated property. I am trying to initialize a member field which requires multiple statements to do completely.
I wouldn't suggest doing this, but you could use an anonymous function to initialize
int _foo = new Func<int>(() =>
{
return 5;
})();
Is there a reason you would like to do it using lambdas rather than named functions, or as a calculated property?
I assume you want to avoid calculated properties because you want to either modify the value later, or the computation is expensive and you want to cache the value.
int? _fooBacking = null;
int Foo
{
get
{
if (!_fooBacking.HasValue)
{
_fooBacking = 5;
}
return _fooBacking.Value;
}
set
{
_fooBacking = value;
}
}
This will use what you evaluate in the conditional the first time it is gotten, while still allowing the value to be assigned.
If you remove the setter it will turn it into a cached calculation. Be careful when using this pattern, though. Side-effects in property getters will be frowned upon because they make the code difficult to follow.
To solve the problem in the general case you'd need to create and then execute an anonymous function, which you can technically do as an expression:
int Foo = new Func<int>(() =>
{
int x = 0;
// Perform calculations to get x
return x;
})();
You can clean this up a bit by writing a helper function:
public static T Perform<T>(Func<T> function)
{
return function();
}
Which lets you write:
int Foo = Perform(() =>
{
int x = 0;
// Perform calculations to get x
return x;
});
While this is better than the first, I think it's pretty hard to argue that either is better than just writing a function.
In the non-general case, many specific implementations can be altered to run on a single line rather than multiple lines. Such a solution may be possible in your case, but we couldn't possibly say without knowing what it is. There will be cases where this is possible but undesirable, and cases where this may actually be preferable. Which are which is of course subjective.
You could initialize your field in the constructor and declare CalculateInitialFoo as local function.
private int _foo;
public MyType()
{
_foo = CalculateInitialFoo();
int CalculateInitialFoo()
{
int x = 0;
// Perform calculations to get x
return x;
}
}
This won't change your code too much but you can at least limit the scope of the method to where it's only used.

How to properly Clear a Queue containing structs?

I have declared a basic struct like this
private struct ValLine {
public string val;
public ulong linenum;
}
and declared a Queue like this
Queue<ValLine> check = new Queue<ValLine>();
Then in a using StreamReader setup where I'm reading through the lines of an input file using ReadLine in a while loop, among other things, I'm doing this to populate the Queue:
check.Enqueue(new ValLine { val = line, linenum = linenum });
("line" is a string containing the text of each line, "linenum" is just a counter that is initialized at 0 and is incremented each time through the loop.)
The purpose of the "check" Queue is that if a particular line meets some criteria, then I store that line in "check" along with the line number that it occurs on in the input file.
After I've finished reading through the input file, I use "check" for various things, but then when I'm finished using it I clear it out in the obvious manner:
check.Clear();
(Alternatively, in my final loop through "check" I could just use .Dequeue(), instead of foreach'ing it.)
But then I got to thinking - wait a minute, what about all those "new ValLine" I generated when populating the Queue in the first place??? Have I created a memory leak? I've pretty new to C#, so it's not coming clear to me how to deal with this - or even if it should be dealt with (perhaps .Clear() or .Dequeue() deals with the now obsoleted structs automatically?). I've spent over an hour with our dear friend Google, and just not finding any specific discussion of this kind of example in regard to the clearing of a collection of structs.
So... In C# do we need to deal with wiping out the individual structs before clearing the queue (or as we are dequeueing), or not? And if so, then what is the proper way to do this?
(Just in case it's relevant, I'm using .NET 4.5 in Visual Studio 2013.)
UPDATE: This is for future reference (you know, like if this page comes up in a Google search) in regard to proper coding. To make the struct immutable as per recommendation, this is what I've ended up with:
private struct ValLine {
private readonly string _val;
private readonly ulong _linenum;
public string val { get { return _val; } }
public ulong linenum { get { return _linenum; } }
public ValLine(string x, ulong n) { _val = x; _linenum = n; }
}
Corresponding to that change, the queue population line is now this:
check.Enqueue(new ValLine(line,linenum));
Also, though not strictly necessary, I did get rid of my foreach on the queue (and the check.Clear();, and changed it to this
while (check.Count > 0) {
ValLine ll = check.Dequeue();
writer.WriteLine("[{0}] {1}", ll.linenum, ll.val);
}
so that the queue is emptied out as the information is output.
UPDATE 2: Okay, yes, I'm still a C# newbie (less than a year). I learn a lot from the Internet, but of course, I'm often looking at examples from more than a year ago. I have changed my struct so now it looks like this:
private struct ValLine {
public string val { get; private set; }
public ulong linenum { get; private set; }
public ValLine(string x, ulong n): this()
{ this.val = x; this.linenum = n; }
}
Interestingly enough, I had actually tried exactly this off the top of my head before coming up with what's in the first update (above), but got a compile error (because I did not have the : this() with the constructor). Upon further suggestion, I checked further and found a recent example showing that : this() for making it work like I tried before, plugged that in, and - Wa La! - clean compile. I like the cleaner look of the code. What the private variables are called is irrelevant to me.
No, you won't have created a memory leak. Calling Clear or Dequeue will clear the memory appropriately - for example, if you had a List<T> then a clear operation might use:
for (int i = 0; i < capacity; i++)
{
array[i] = default(T);
}
I don't know offhand whether Queue<T> is implemented with a circular buffer built on an array, or a linked list - but either way, you'll be fine.
Having said that, I would strongly recommend against using mutable structs as you're doing here, along with mutable fields. While it's not causing the particular problem you're envisaging, they can behave in confusing ways.

How to validate two out parameters do not point to the same address?

I created a method that takes 2 out parameters. I noticed that it is possible for calling code to pass in the same variable for both parameters, but this method requires that these parameters be separate. I came up with what I think is the best way to validate that this is true, but I am unsure if it will work 100% of the time. Here is the code I came up with, with questions embedded.
private static void callTwoOuts()
{
int same = 0;
twoOuts(out same, out same);
Console.WriteLine(same); // "2"
}
private static void twoOuts(out int one, out int two)
{
unsafe
{
// Is the following line guaranteed atomic so that it will always work?
// Or could the GC move 'same' to a different address between statements?
fixed (int* oneAddr = &one, twoAddr = &two)
{
if (oneAddr == twoAddr)
{
throw new ArgumentException("one and two must be seperate variables!");
}
}
// Does this help?
GC.KeepAlive(one);
GC.KeepAlive(two);
}
one = 1;
two = 2;
// Assume more complicated code the requires one/two be seperate
}
I know that an easier way to solve this problem would simply be to use method-local variables and only copy to the out parameters at the end, but I am curious if there is an easy way to validate the addresses such that this is not required.
I'm not sure why you ever would want to know it, but here's a possible hack:
private static void AreSameParameter(out int one, out int two)
{
one = 1;
two = 1;
one = 2;
if (two == 2)
Console.WriteLine("Same");
else
Console.WriteLine("Different");
}
static void Main(string[] args)
{
int a;
int b;
AreSameParameter(out a, out a); // Same
AreSameParameter(out a, out b); // Different
Console.ReadLine();
}
Initially I have to set both variables to any value. Then setting one variable to a different value: if the other variable was also changed, then they both point to the same variable.

Categories

Resources