C# memory handling for methods

C# memory handling for methods - c#

I have a question about they way C# functions, or methods, handle memory when certain objects are used as input arguments. I have tried searching for an answer to this but haven't been able to find anything, I might not know what to look for though.
The question: Say I have a really big integer array of size 10.000 by 10.000, called 'MyArray'. Lets say I moreover have some method called 'MyMethod' which takes several entries from two specified rows (this is the input) from MyArray and performs some operations on it, such as adding or multiplying these numbers, and then returns another integer.
To keep my code as short as possible I would prefer to make a method
MyMethod(int i, int j, int[][] MyArray)
rather than having to enter all the numbers from the array as seperate arguments. However does this mean the method creates a copy of MyArray when it is called or does C# know that if this data is only read and not edited in any way, that making a copy isn't needed?

In C#, arrays are actually objects, and not just addressable regions of contiguous memory as in C and C++. Thus, in our case, only the reference of the array is passed as an argument for the method.

C# does not create a copy as the array will be passed as a reference (like a C++ pointer) to the method. In general only struct types will be passed as a copy and normal class instances will be passed as a reference.
You can read more on the topic on MSDN

As you can read here : MSDN - Passing arrays as argument
Arrays can be passed as arguments to method parameters. Because arrays are reference types, the method can change the value of the elements.

Arrays are classes, and that's why they're just references and when we pass array into a method all we need is to pass an address (4 or 8 bytes). Proof:
Boolean isClass = typeof(int[][]).IsClass; // <- return true
Structs are passed by value, e.g. int is a struct:
Boolean isClass = typeof(int).IsClass; // <- return false;

Related

C# function that accepts a jagged array of any dimension

Is there anyway to declare a function that takes in a jagged array of any dimension in C#? Right now if I pass in a jagged array in a function that takes in 1 dimensional arrays it would give me an error:
void func<T>(T[] arr){
return;
}
double[][] x = new double[10][];
func<double>(x);
// Error: cannot convert from double[][] to double[]
Is there any way I can change the function signature to make it accept both 1 dimensional array and jagged arrays? The function will only perform actions that would work on both jagged arrays and 1D arrays. I don't really want to use overload since the dimension of the jagged array might be unknown.

To accept an array argument of any type, any size, and any number of dimensions, you can use the Array class from the System namespace.
Here is what it would look like:
void Function(Array arr)
{
//Do your stuff with "arr",
//you can mostly just treat it like a normal array, looping and so on...
}
And if you would like to know the length of a specific dimension, just use the Array.GetLength(int Dimension) function.
If you would also like to know how many dimensions this array has, just use the Array.Rank property.
As for the question, you can simply do this and it wont cause any problems:
double[][] x = new double[10][];
Func<double>(x);
MS Doc:
https://learn.microsoft.com/en-us/dotnet/api/system.array?view=netcore-3.1
https://learn.microsoft.com/en-us/dotnet/api/system.array.rank?view=netcore-3.1
Hope this helped you!

Is there anyway to declare a function that takes in a jagged array of any dimension in C#?
Yes, and you already actually have one. Your solution takes any T[], and T itself can be an array type - which is precisely how jagged arrays work. So let's take your original function (and just change the name to be more conventional and avoid confusion with the Func delegate type):
void ProcessArray<T>(T[] array)
{
// (Method body is irrelevant for now)
}
If you want that method to process a double[], then the type argument for the method (i.e. the type of T for that method invocation) needs to be double. If you want the method to process a double[][], the type argument needs to be double[], and so on.
You can do this by explicitly specifying the type argument:
double[] simpleArray = new double[10];
ProcessArray<double>(simpleArray);
double[][] jaggedArray = new double[10][];
ProcessArray<double[]>(jaggedArray);
Alternatively, you can use generic type inference to let the compiler figure out what type argument you want for T automatically:
double[] simpleArray = new double[10];
ProcessArray(simpleArray);
double[][] jaggedArray = new double[10][];
ProcessArray(jaggedArray);
In my experience, you can usually use type inference when calling generic methods, but there are exceptions where you need to specify the type arguments explicitly.
The problem you faced was that you were providing a type argument of double, but a regular argument of type double[][].
Now, if in your method you want to process all the individual elements, e.g. recursively calling your method with each of the "less jagged" arrays, you may need to use either reflection (e.g. typeof(T).GetElementType()) or members of the Array type to get more information at execution time. We don't know what your method needs to accomplish, so we can't provide any more concrete advice than that right now - but hopefully you can experiment with those two approaches.

<T> means any type. As long as you respect the operations and the valid states of this mystical type <T> it can be anything. Furthermore, is deduced from the types you feed the method as its parameters, so you need only to be explicit when declaring it but not when invoking it.
So, you made a change to accept N-dimensional arrays called... idk, Transform:
This could be its signature: void Transform<T>(T array). In this case T is already an array, if the parameter was an array of type T (example: void f<T>(T[] a) this would be the signature, and so on.
T is a template, a type potentially built when you compile (and sometimes even at runtime). The power of template programming immense, and due to C#'s coder first philosophy we get to experience in its full glory without having to shed several tears over a C++ text-book.

Have a method directly edit a variable passed as a parameter?

I have a quick question (this is in C#). Let's say I have an array of numbers:
int[] count = new int[4] {0, 4, 3, 2};
I have a method that does some stuff:
public void Invert(int[] arrayVar)
{
for (int i = 0; i < arrayVar.Count; i++)
{
//arrayVar[i] = stuff
}
}
If I call the method by doing this:
Invert(count);
Is there a way to have the method directly edit the count array instead of just duplicating it and editing the duplicate? I can't have a global variable for multithreading reasons and I can't return the end result because I have similar methods that have to return very specific things. Is this possible? Thanks!

Is there a way to have the method directly edit the count array instead of just duplicating it and editing the duplicate?
Yes. Do exactly what you are doing. Your program already does exactly what you are asking for.
Arrays are passed by reference in C#. count and arrayVar refer to the same array. When you pass an array to a method, that method does not get a copy of the array. It gets a copy of a reference to the array.
Changes that you make to arrayVar inside Invert will also be made to count inside the caller because those two variables both contain a reference to the same array.
Do not confuse this with the ref feature of C#. Ref makes two variables act as though they are the same variable. Here you have two different variables that both refer to the same array. Make sure that the distinction is clear in your mind.
A number of answers confusingly suggest that you use a list instead of an array. Lists are also reference types; they have the same semantics as arrays when passed to a method. That is, the passed-in value is a reference. The reason to use a list instead of an array is because lists are more flexible and powerful than arrays. Arrays are fixed in size; an array with ten elements always has ten elements. A list can have new elements added or old elements removed.

Why should each element in array be allocated again in c#

Following is the code I wrote
Calc[] calculators = new Calc[10];
calculators[0].AddToSum(10); (the corresponding classes and methods are written).
But I got "Object reference not set to an instance of an object" exception.Then with some research I got the exception removed by doing following.
for (int i = 0; i < 10; i++)
{
calculators[i] = new Calc();
}
Can somebody explain why we need to allocate memory again unlike in c/c++.
This is how I did it in c++:
Calculator *calc=new Calculator[10]//I know I need to check for std::bad_alloc exception
calculators[0].AddToSum(10);
delete[] calc;

In C#, there are reference types, and there are value types. Classes are reference types. When you create a variable of a reference type, you are creating a reference, not an object. The default state of a reference is null. If you want it to refer to an object, you have to explicitly initialize it with new, or assign if from another initialized reference.
C++ does not have this distinction. Every type is a value type (though you can also create references to any type). When you create a variable of a value type, you are creating an object.

in new Calc[10] you are allocating and sizing the array. in new Calc() you are creating the actual Calc objects

But you would get that same error with this statement
Calc calc;
calc.AddToSum(10);
Object is null until you you assign a value.
Calc[] calculators = new Calc[10]; does not allocate.
Based on the answer from Benjamin (+1) it works if Calc is a reference type.
Can you just make Calc a struct?

I don't think you allocate the memory again, but you still need to instantiate some value for calculators[0].
In your first code-segment, your are trying to call .AddToSum on a value that is Null.
Ps: You could do the following instead, to initialize each Calc from the start:
Calc[] calculators = new Calc[10]{
new Calc(),
new Calc(),
...,
// Repeat 10 times to match array length
};
Update: In response to the comments below; Ok, try this then:
calc[] calculators = Enumerable.Repeat(new Calc(), 127).ToArray<Calc>();

When you create an array of objects in c++ you allocate memory for all the fields of each object. So if your objects have two integer fields and you make an array of size two, enough memory is allocated to hold four integers.
On the other hand in c# when you make an array of objects you are creating and array of references (pointers to objects). So you cannot store an instance unless you allocate memory for each reference (by using new).
The same thing in c++ would be making an array of pointers, and then you'll have to instantiate each element of your array.

Your C++ code is also wrong.
In C++ you've allocated an array with space for 10 Calculator objects.
When you do the operation, it's reading from that (uninitialized) memory, grabbing a value, and adding to it, then writing that back out.
But you've got an uninitialized object to start from.
It likely works in C++ because you have an object (Calculator) that doesn't require the constructor to be called. If it had any initialization that required the constructor to be called, it wouldn't work. If you were to use a debugger and put a breakpoint in Calculator constructor, you'll see it's never called.
Anyway, to directly answer the question, this is the way C# works. Allocating an array creates space for the array, but all objects within the array (assuming object types) are null until themselves allocated.
Think of it this way: I create an array to hold 10 objects of Class X. But X has a constructor that takes a string, and I want to call it with a different string for each of those objects. How would one do so without explicitly creating each of those 10 objects and passing the right string to each constructor?

How are strings stored in an object array?

object[] objs = new object[]{"one","two","three"};
Are the strings stored in the array as references to the string objects
[#] - one
[#] - two
[#] - three
or are the string objects stored in the array elements?
[one][two][three]
Thanks.
Edit: Sorry, my fancy diagram failed miserably.

String objects can never be stored directly in an array, or as any other variable. It's always references, even in a simple case such as:
string x = "foo";
Here the value of x is a reference, not an object. No expression value is ever an object - it's always either a reference, a value type value, or a pointer.

Jon Skeet describes the actual implementation very well, but let's consider why it would be nonsensical for the CLR to store strings directly in an array.
The first reason is that storing strings directly in the array would harm performance. If strings were stored directly in an array, then to get to the element 1000 of the array the CLR would have to walk through the bytes of all the strings in the array until it reached element 1000, checking all the while for string boundaries. Since strings and any other reference types are stored in arrays as references, finding the right element of the array requires one multiplication, one addition, and following one pointer (the notion of a pointer here is at the implementation level, not the programmer-visible level). This produces much better performance.
The second reason that strings cannot reasonably be stored directly in an array is that C# arrays of reference type are covariant. Let's say that strings were stored directly in the array generated with
string[] strings = new string[] {"one", "two", "three"};
Then, you cast this to an object array, which is legal
object[] objs = (object[])strings;
How is the compiler supposed to generate code that takes this possibility into account? A method that takes an object array as a parameter can have a string array passed to it, so the CLR needs to know whether to index into the array as an object array, or a string array, or some other type of array. Somehow, at runtime every array would have to be marked with the type declaration of the array, and every array access would have to check the type declaration and then traverse the array differently depending on the type of the array. It's far simpler to stick with references, which allow a single implementation of array accesses and improve performance to boot.

They're stored internally as references. A copy of the string is stored, and anywhere that string is used, there's a reference to the same stored string. (this is one of many reasons that strings are immutable; otherwise, modifying one instance of a string would modify everywhere it appeared)

all the primitive types are stored directly into a array but all other object or reference types are stored as memory references. This is true for all Objects not limited to Strings.

What happens when I call a function in a DLL

I use a dll compiled from C++ code (LPSolve, see http://lpsolve.sourceforge.net/5.5/), from my C# code. I use the API to build a linear programming model, and subsequently solve it. I call functions such as:
[DllImport("lpsolve55.dll", SetLastError = true)]
public static extern bool add_columnex
(int lp, int count, double[] column, int[] rowno);
I am wondering what happens, memorywise, when I call such a function and the ints and arrays that I created in managed code leave scope (in the c# code). Will they be eligible for garbage collection? What does this mean for the c++ code? Or are the ineligible, and in that case: why?

Because the function prototype is using plain old datatypes and arrays, the memory for these values is pinned in place and then the native code acts directly on the data. Then when the function returns, the memory is unpinned and can be garbage collected.
In other words, they never leave the scope.
In terms of the C++ code, if it needs to store any of the data then it will need to take a copy of the data passed into it and then manage that memory itself.

I think Nick has covered the basic part, this is just to add more information. Array of int/double are considered as blittable types (types that have same layout in managed/unmanaged worlds) - these are typically get pinned when Marshalled. So you don't have to about GC. Also, what you have done indicates passing array by value - in such case, marshaller treats this as In parameter - in case, your unmanaged dll is going to update values in array then I would suggest you to mark it as In/Out parameter (e.g. [In, Out]double[] column). For more info:
Blittable and Non-Blittable Types:
http://msdn.microsoft.com/en-us/library/75dwhxf7.aspx
Copying and Pinning: http://msdn.microsoft.com/en-us/library/23acw07k.aspx
Marshalling Arrays: http://learning.infocollections.com/ebook%202/Computer/Programming/General/Programming.With.Microsoft.Dot.NET/LiB0948.htm

If your app doesn't crash with an AccessViolationException after using it for a while (past a garbage collection) then it is pretty safe to assume that the unmanaged code made a copy of the array elements you passed it. This is the normal thing to do, the library would otherwise be very hard to use from native code as well. There also ought to be an API function that lets you clear or re-initialize the model, that one should be releasing the memory.

The C or C++ function has a prototype like this:
bool add_columnex(int lp, int count, double[] column, int[] rowno);
The parameters lp and count are passed by value. The parameters column and rowno are also passed by value but the actual data is passed by reference, the function add_columnex will have to dereference column and rowno. This dereferentiation is only allowed during the function call. When the function returns, these parameters are out of scope. The kind of derenferentation must be in the contract of the interface.
When the function returns all arguments get out of scope and there is no mean that the function could do anything after call. If the function stores copies of the arguments, i.e. the address of the double or rowno array, this must be allowed by the contract. In that case you would get in trouble. A better contract would be that the function must copy the dereferenced data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.