What's the C# equivalent of C++ vector?
I am searching for this feature:
To have a dynamic array of contiguously stored memory that has no performance penalty for access vs. standard arrays.
I was searching and they say .NET equivalent to the vector in C++ is the ArrayList, so:
Do ArrayList have that contiguous memory feature?
You could use a List<T> and when T is a value type it will be allocated in contiguous memory which would not be the case if T is a reference type.
Example:
List<int> integers = new List<int>();
integers.Add(1);
integers.Add(4);
integers.Add(7);
int someElement = integers[1];
use List<T>. Internally it uses arrays and arrays do use contiguous memory.
C# has a lot of reference types. Even if a container stores the references contiguously, the objects themselves may be scattered through the heap
First of all, stay away from Arraylist or Hashtable. Those classes are to be considered deprecated, in favor of generics. They are still in the language for legacy purposes.
Now, what you are looking for is the List<T> class. Note that if T is a value type you will have contiguos memory, but not if T is a reference type, for obvious reasons.
It looks like CLR / C# might be getting better support for Vector<> soon.
http://blogs.msdn.com/b/dotnet/archive/2014/04/07/the-jit-finally-proposed-jit-and-simd-are-getting-married.aspx
Related
I'm learning C# and basically know the difference between arrays and Lists that the last is a generic and can dynamically grow but I'm wondering:
are List elements sequentially located in heap like array or is each element located "randomly" in a different locations?
and if that is true, does that affect the speed of access & data retrieval from memory?
and if that is true, is this what makes arrays a little faster than Lists?
Let's see the second and the third questions first:
and if that true does that affect the speed of access & data retrieval from memory ?
and if that true is this what makes array little faster than list ?
There is only a single type of "native" collection in .NET (with .NET I mean the CLR, so the runtime): the array (technically, if you consider a string a type of collection, then there are two native types of collections :-) ) (technically part 2: not all the arrays you think that are arrays are "native" arrays... Only the monodimensional 0 based arrays are "native" arrays. Arrays of type T[,] aren't, and arrays where the first element doesn't have an index of 0 aren't) . Every other collection (other than the LinkedList<>) is built atop it. If you look at the List<T> with IlSpy you'll see that at the base of it there is a T[] with an added int for the Count (the T[].Length is the Capacity). Clearly an array is a little faster than a List<T> because to use it, you have one less indirection (you access the array directly, instead of accessing the array that accesses the list).
Let's see the first question:
does List elements sequentially located in heap like array or each element is located randomly in different locations?
Being based on an array internally, clearly the List<> memorizes its elements like an array, so in a contiguous block of memory (but be aware that with a List<SomeObject> where SomeObject is a reference type, the list is a list of references, not of objects, so the references are put in a contiguous block of memory (we will ignore that with the advanced memory management of computers, the word "contiguous block of memory" isn't exact", it would be better to say "a contiguous block of addresses") )
(yes, even Dictionary<> and HashSet<> are built atop arrays. Conversely a tree-like collection could be built without using an array, because it's more similar to a LinkedList)
Some additional details: there are four groups of instructions in the CIL language (the intermediate language used in compiled .NET programs) that are used with "native" arrays:
Newarr
Ldelem and family Ldelem_*
Stelem and family Stelem_*
ReadOnly (don't ask me its use, I don't know, and the documentation isn't clear)
if you look at OpCodes.Newarr you'll see this comment in the XML documentation:
// Summary:
// Pushes an object reference to a new zero-based, one-dimensional array whose
// elements are of a specific type onto the evaluation stack.
Yes, elements in a List are stored contiguously, just like an array. A List actually uses arrays internally, but that is an implementation detail that you shouldn't really need to be concerned with.
Of course, in order to get the correct impression from that statement, you also have to understand a bit about memory management in .NET. Namely, the difference between value types and reference types, and how objects of those types are stored. Value types will be stored in contiguous memory. With reference types, the references will be stored in contiguous memory, but not the instances themselves.
The advantage of using a List is that the logic inside of the class handles allocating and managing the items for you. You can add elements anywhere, remove elements from anywhere, and grow the entire size of the collection without having to do any extra work. This is, of course, also what makes a List slightly slower than an array. If any reallocation has to happen in order to comply with your request, there'll be a performance hit as a new, larger-sized array is allocated and the elements are copied to it. But it won't be any slower than if you wrote the code to do it manually with a raw array.
If your length requirement is fixed (i.e., you never need to grow/expand the total capacity of the array), you can go ahead and use a raw array. It might even be marginally faster than a List because it avoids the extra overhead and indirection (although that is subject to being optimized out by the JIT compiler).
If you need to be able to dynamically resize the collection, or you need any of the other features provided by the List class, just use a List. The performance difference will be virtually imperceptible.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When to use ArrayList over array[] in c#?
From the perspective of memory or processor costs, does there appear to be a significant difference between an array and an arrayList object?
An array is a low-level data structure that essentially maps to a region in memory. An ArrayList is a variable length list implemented as an array of object that is re-allocated as the list grows.
ArrayList therefore has some overhead related to managing the size of the internal array, and more overhead related to casting objects to the correct type when you access the list.
Also, storing everything as object means that value types get boxed on write and unboxed on read, which is extremely detrimental to performance. Using List<T>, a similar but strongly-typed variable size list avoids this issue.
In fact, ArrayList is practically deprecated in favor of List<T> since .NET 2.0.
An array is a contiguous block of memory of fixed size, whereas an ArrayList (though you should prefer List since .NET 2.0) wraps an array to provide dynamically-resizable storage.
The "difference" between them being that, as far as they're encapsulated, an ArrayList is resizable, an array isn't. As far as the implementation is concerned: because an ArrayList wraps (and reallocates) arrays it will require more slightly more memory than an array (as it has to know the current number of elements, as opposed to its capacity), furthermore an ArrayList also requires CPU time to reallocate and copy its internal array if it ever reaches its internal capacity.
However, instantiating an ArrayList is no more expensive than allocating an array. The only difference there being the handful of instructions needed to initialize the ArrayList's state. The difference is negligible and not worth worrying about.
You'll find that if you are reallocating an array by yourself as the means of creating a resizable collection then you're better off using ArrayList/List as it has been thoroughly tested.
I recently came across the link below which I have found quite interesting.
http://en.wikipedia.org/wiki/XOR_linked_list
General-purpose debugging tools
cannot follow the XOR chain, making
debugging more difficult; [1]
The price for the decrease in memory
usage is an increase in code
complexity, making maintenance more
expensive;
Most garbage collection schemes do
not work with data structures that do
not contain literal pointers;
XOR of pointers is not defined in
some contexts (e.g., the C language),
although many languages provide some
kind of type conversion between
pointers and integers;
The pointers will be unreadable if
one isn't traversing the list — for
example, if the pointer to a list
item was contained in another data
structure;
While traversing the list you need to
remember the address of the
previously accessed node in order to
calculate the next node's address.
Now I am wondering if that is exclusive to low level languages or if that is also possible within C#?
Are there any similar options to produce the same results with C#?
TL;DR I quickly wrote a proof-of-concept XorLinkedList implementation in C#.
This is absolutely possible using unsafe code in C#. There are a few restrictions, though:
XorLinkedList must be "unmanaged structs", i.e., they cannot contain managed references
Due to a limitation in C# generics, the linked list cannot be generic (not even with where T : struct)
The latter seems to be because you cannot restrict the generic parameter to unmanaged structs. With just where T : struct you'd also allow structs that contain managed references.
This means that your XorLinkedList can only hold primitive values like ints, pointers or other unmanaged structs.
Low-level programming in C#
private static Node* _ptrXor(Node* a, Node* b)
{
return (Node*)((ulong)a ^ (ulong)b);//very fragile
}
Very fragile, I know. C# pointers and IntPtr do not support the XOR-operator (probably a good idea).
private static Node* _allocate(Node* link, int value = 0)
{
var node = (Node*) Marshal.AllocHGlobal(sizeof (Node));
node->xorLink = link;
node->value = value;
return node;
}
Don't forget to Marshal.FreeHGlobal those nodes afterwards (Implement the full IDisposable pattern and be sure to place the free calls outside the if(disposing) block.
private static Node* _insertMiddle(Node* first, Node* second, int value)
{
var node = _allocate(_ptrXor(first, second), value);
var prev = _prev(first, second);
first->xorLink = _ptrXor(prev, node);
var next = _next(first, second);
second->xorLink = _ptrXor(node, next);
return node;
}
Conclusion
Personally, I would never use an XorLinkedList in C# (maybe in C when I'm writing really low level system stuff like memory allocators or kernel data structures. In any other setting the small gain in storage efficiency is really not worth the pain. The fact that you can't use it together with managed objects in C# renders it pretty much useless for everyday programming.
Also storage is almost free today, even main memory and if you're using C# you likely don't care about storage much. I've read somewhere that CLR object headers were around ~40 bytes, so this one pointer will be the least of your concerns ;)
C# doesn't generally let you manipulate references at that level, so no, unfortunately.
As an alternative to the unsafe solutions that have been proposed.
If you backed your linked list with an array or list collection where instead of a memory pointer 'next' and 'previous' indicate indexes into the array you could implement this xor without resorting to using unsafe features.
There are ways to work with pointers in C#, but you can have a pointer to an object only temporarily, so you can't use them in this scenario. The main reason for this is garbage collection – as long as you can do things like XOR pointers and unXOR them later, the GC has no way of knowing whether it's safe to collect certain object or not.
You could make something very similar by emulating pointers using indexes in one big array, but you would have to implement a simple form of memory management yourself (i.e. when creating new node, where in the array should I put it?).
Another option would be to go with C++/CLI which allows you both the full flexibility of pointers on one hand and GC and access to the framework when you need it on the other.
Sure. You would just need to code the class. the XOR operator in c# is ^
That should be all you need to start the coding.
Note this will require the code to be declared "unsafe." See here: for how to use pointers in c#.
Making a broad generalization here: C# appears to have gone the path of readability and clean interfaces and not the path of bit fiddling and packing all the information as dense as possible.
So, unless you have a specific need here, you should use the List you are provided. Future maintenance programmers will thank you for it.
It is possible however you have to understand how C# looks at objects. An instance variable does not actually contain an object but a pointer to the object in memory.
DateTime dt = DateTime.Now;
dt is a pointer to a struct in memory containing the DateTime scheme.
So you could do this type of linked list although I am not sure why you would as the framework typically has already implemented the most efficient collections. As a thought expirament it is possible.
In delphi the "ZeroMemory" procedure, ask for two parameters.
CODE EXAMPLE
procedure ZeroMemory(Destination: Pointer; Length: DWORD);
begin
FillChar(Destination^, Length, 0);
end;
I want make this, or similar in C#... so, what's their equivalent?
thanks in advance!
.NET framework objects are always initialized to a known state
.NET framework value types are automatically 'zeroed' -- which means that the framework guarantees that it is initialized into its natural default value before it returns it to you for use. Things that are made up of value types (e.g. arrays, structs, objects) have their fields similarly initialized.
In general, in .NET all managed objects are initialized to default, and there is never a case when the contents of an object is unpredictable (because it contains data that just happens to be in that particular memory location) as in other unmanaged environments.
Answer: you don't need to do this, as .NET will automatically "zero" the object for you. However, you should know what the default value for each value type is. For example, the default of a bool is false, and the default of an int is zero.
Unmanaged objects
"Zero-ing" a region of memory is usually only necessary in interoping with external, non-managed libraries.
If you have a pinned pointer to a region of memory containing data that you intend to pass to an outside non-managed library (written in C, say), and you want to zero that section of memory, then your pointer most likely points to a byte array and you can use a simple for-loop to zero it.
Off-topic note
On the flip side, if a large object is allocated in .NET, try to reuse it instead of throwing it away and allocating a new one. That's because any new object is automatically "zeroed" by the .NET framework, and for large objects this clearing will cause a hidden performance hit.
You very rarely need unsafe code in C#. Usually only when interacting with native libraries.
The Marshal class as some low level helper functions, but I'm not aware of any that zeros out memory.
Firstly, in .Net (including C#) then value types are zero by default - so this takes away one of the common uses of ZeroMemory.
Secondly, if you want to zero a list of type T then try a method like:
void ZeroMemory<T>(IList<T> destination)
{
for (var i=0;i<destination.Count; i+))
{
destination[i] = default(T);
}
}
If a list isn't available... then I think I'd need to see more of the calling code.
Technically there is the Array.Clear, but it's only for managed arrays. What do you want to do?
If one could put an array of pointers to child structs inside unsafe structs in C# like one could in C, constructing complex data structures without the overhead of having one object per node would be a lot easier and less of a time sink, as well as syntactically cleaner and much more readable.
Is there a deep architectural reason why fixed arrays inside unsafe structs are only allowed to be composed of "value types" and not pointers?
I assume only having explicitly named pointers inside structs must be a deliberate decision to weaken the language, but I can't find any documentation about why this is so, or the reasoning for not allowing pointer arrays inside structs, since I would assume the garbage collector shouldn't care what is going on in structs marked as unsafe.
Digital Mars' D handles structs and pointers elegantly in comparison, and I'm missing not being able to rapidly develop succinct data structures; by making references abstract in C# a lot of power seems to have been removed from the language, even though pointers are still there at least in a marketing sense.
Maybe I'm wrong to expect languages to become more powerful at representing complex data structures efficiently over time.
One very simple reason: dotNET has a compacting garbage collector. It moves things around. So even if you could create arrays like that, you would have to pin every allocated block and you would see the system slow down to a crawl.
But you are trying to optimize based on an assumption. Allocation and cleanup of objects in dotNET is highly optimized. So write a working program first and then use a profiler to find your bottlenecks. It will most likely not be the allocation of your objects.
Edit, to answer the latter part:
Maybe I'm wrong to expect languages to
become more powerful at representing
complex data structures efficiently
over time.
I think C# (or any managed language) is much more powerful at representing
complex data structures (efficiently). By changing from low level pointers to garbage collected references.
I'm just guessing, but it might have to do with different pointer sizes for different target platforms. It seems that the C# compiler is using the size of the elements directly for index calculations (i.e. there is no CLR support for calculating fixed sized buffers indices...)
Anyway you can use an array of ulongs and cast the pointers to it:
unsafe struct s1
{
public int a;
public int b;
}
unsafe struct s
{
public fixed ulong otherStruct[100];
}
unsafe void f() {
var S = new s();
var S1 = new s1();
S.otherStruct[4] = (ulong)&S1;
var S2 = (s1*)S.otherStruct[4];
}
Putting a fixed array of pointers in a struct would quickly make it a bad candidate for a struct. The recommended size limit for a struct is 16 bytes, so on a x64 system you would be able to fit only two pointers in the array, which is pretty pointless.
You should use classes for complex data structures, if you use structures they become very limited in their usage. You wouldn't for example be able to create a data structure in a method and return it, as it would then contain pointers to structs that no longer exists as they were allocated in the stack frame of the method.