XOR linked list - c#

I recently came across the link below which I have found quite interesting.
http://en.wikipedia.org/wiki/XOR_linked_list
General-purpose debugging tools
cannot follow the XOR chain, making
debugging more difficult; [1]
The price for the decrease in memory
usage is an increase in code
complexity, making maintenance more
expensive;
Most garbage collection schemes do
not work with data structures that do
not contain literal pointers;
XOR of pointers is not defined in
some contexts (e.g., the C language),
although many languages provide some
kind of type conversion between
pointers and integers;
The pointers will be unreadable if
one isn't traversing the list — for
example, if the pointer to a list
item was contained in another data
structure;
While traversing the list you need to
remember the address of the
previously accessed node in order to
calculate the next node's address.
Now I am wondering if that is exclusive to low level languages or if that is also possible within C#?
Are there any similar options to produce the same results with C#?

TL;DR I quickly wrote a proof-of-concept XorLinkedList implementation in C#.
This is absolutely possible using unsafe code in C#. There are a few restrictions, though:
XorLinkedList must be "unmanaged structs", i.e., they cannot contain managed references
Due to a limitation in C# generics, the linked list cannot be generic (not even with where T : struct)
The latter seems to be because you cannot restrict the generic parameter to unmanaged structs. With just where T : struct you'd also allow structs that contain managed references.
This means that your XorLinkedList can only hold primitive values like ints, pointers or other unmanaged structs.
Low-level programming in C#
private static Node* _ptrXor(Node* a, Node* b)
{
return (Node*)((ulong)a ^ (ulong)b);//very fragile
}
Very fragile, I know. C# pointers and IntPtr do not support the XOR-operator (probably a good idea).
private static Node* _allocate(Node* link, int value = 0)
{
var node = (Node*) Marshal.AllocHGlobal(sizeof (Node));
node->xorLink = link;
node->value = value;
return node;
}
Don't forget to Marshal.FreeHGlobal those nodes afterwards (Implement the full IDisposable pattern and be sure to place the free calls outside the if(disposing) block.
private static Node* _insertMiddle(Node* first, Node* second, int value)
{
var node = _allocate(_ptrXor(first, second), value);
var prev = _prev(first, second);
first->xorLink = _ptrXor(prev, node);
var next = _next(first, second);
second->xorLink = _ptrXor(node, next);
return node;
}
Conclusion
Personally, I would never use an XorLinkedList in C# (maybe in C when I'm writing really low level system stuff like memory allocators or kernel data structures. In any other setting the small gain in storage efficiency is really not worth the pain. The fact that you can't use it together with managed objects in C# renders it pretty much useless for everyday programming.
Also storage is almost free today, even main memory and if you're using C# you likely don't care about storage much. I've read somewhere that CLR object headers were around ~40 bytes, so this one pointer will be the least of your concerns ;)

C# doesn't generally let you manipulate references at that level, so no, unfortunately.

As an alternative to the unsafe solutions that have been proposed.
If you backed your linked list with an array or list collection where instead of a memory pointer 'next' and 'previous' indicate indexes into the array you could implement this xor without resorting to using unsafe features.

There are ways to work with pointers in C#, but you can have a pointer to an object only temporarily, so you can't use them in this scenario. The main reason for this is garbage collection – as long as you can do things like XOR pointers and unXOR them later, the GC has no way of knowing whether it's safe to collect certain object or not.
You could make something very similar by emulating pointers using indexes in one big array, but you would have to implement a simple form of memory management yourself (i.e. when creating new node, where in the array should I put it?).
Another option would be to go with C++/CLI which allows you both the full flexibility of pointers on one hand and GC and access to the framework when you need it on the other.

Sure. You would just need to code the class. the XOR operator in c# is ^
That should be all you need to start the coding.
Note this will require the code to be declared "unsafe." See here: for how to use pointers in c#.

Making a broad generalization here: C# appears to have gone the path of readability and clean interfaces and not the path of bit fiddling and packing all the information as dense as possible.
So, unless you have a specific need here, you should use the List you are provided. Future maintenance programmers will thank you for it.

It is possible however you have to understand how C# looks at objects. An instance variable does not actually contain an object but a pointer to the object in memory.
DateTime dt = DateTime.Now;
dt is a pointer to a struct in memory containing the DateTime scheme.
So you could do this type of linked list although I am not sure why you would as the framework typically has already implemented the most efficient collections. As a thought expirament it is possible.

Related

Is there something like C++ pointer in C#? [duplicate]

I'm writing an application that work with a tree data structure. I've written it with C++, now i want to write it by C#. I use pointers for implementing the tree data structure. Is there a pointer in C# too? Is it safe to use it?
If you're implementing a tree structure in C# (or Java, or many other languages) you'd use references instead of pointers. NB. references in C++ are not the same as these references.
The usage is similar to pointers for the most part, but there are advantages like garbage collection.
class TreeNode
{
private TreeNode parent, firstChild, nextSibling;
public InsertChild(TreeNode newChild)
{
newChild.parent = this;
newChild.nextSibling = firstChild;
firstChild = newChild;
}
}
var root = new TreeNode();
var child1 = new TreeNode();
root.InsertChild(child1);
Points of interest:
No need to modify the type with * when declaring the members
No need to set them to null in a constructor (they're already null)
No special -> operator for member access
No need to write a destructor (although look up IDisposable)
YES. There are pointers in C#.
NO. They are NOT safe.
You actually have to use keyword unsafe when you use pointers in C#.
For examples look here and MSDN.
static unsafe void Increment(int* i)
{
*i++;
}
Increment(&count);
Use this instead and code will be SAFE and CLEAN.
static void Increment(ref int i)
{
i++;
}
Increment(ref count);
Is there pointer in C# too?
Yes, declared using the syntax int* varName;.
Is using of that safe?
No pointers are not safe.
There are safe ways to construct a data structure without pointers. If the nodes are classes, then they'll automatically be reference types so you don't need any pointers. Otherwise, you can box them into a reference.
Yes, there is a pointer:
IntPtr
Wikipedia: "which is a safe managed equivalent to int*, and does not require unsafe code"
There is a great series of Data Structures implemented in .Net on Microsoft Docs.
Data Structures Overview
They include sample code for things like Binary Search Tree, Graph, SkipList, NodeList, etc. The code is quite complete and includes a number of pages of docs about why these structures work, etc.
None of the ones from Microsoft use pointers. In general, you never NEED to use them in C#. There are times when using them would be nice, or they are just the way you think from C++. But you can usually find a way not to use them.
The biggest reasons why not to use unsafe code for pointers is that you lose Medium Trust compliance. You can't run through mechanisms like click once, asp.net websites, and Silverlight doesn't allow them either. Stick with refs and fully managed concepts to ensure your code can run in more places.

Should I use Class or Struct in the following case (data structure with many fields)? [duplicate]

I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic list and when they are, I'll loop them and check value a and maybe update value b.
Is it faster/better to create these objects as class or as struct?
EDIT
a. The properties are value types (except the string i think?)
b. They might (we're not sure yet) have a validate method
EDIT 2
I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?
Is it faster to create these objects as class or as struct?
You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.
Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.
Is it better to create these objects as class or as struct?
Maybe class, maybe struct. As a rule of thumb:
If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.
If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.
Are objects on the heap and the stack processed equally by the garbage collector?
No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)
But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.
Is that clear?
Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.
Example:
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i].id = i;
list[i].isValid = true;
}
is about 2 to 3 times faster than
Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
list[i] = new Value(i, true);
}
where Value is a struct with two fields (id and isValid).
struct Value
{
int id;
bool isValid;
public Value(int i, bool isValid)
{
this.i = i;
this.isValid = isValid;
}
}
On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.
Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.
In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.
(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)
Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.
When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.
More here:
http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx
If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.
As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.
Edit:
I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.
There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.
Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.
The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.
A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.
I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.
For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.
A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:
class ExposedHolder<T>
{
public T Value;
ExposedHolder() { }
ExposedHolder(T val) { Value = T; }
}
If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.
One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.
Use classes.
On a general note. Why not update value b as you create them?
From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.
Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.
That's re: performance.

Lazarus pointer type in C#

I am trying to convert a lazarus project to c#. It's been good so far but i have a problem now.
There is a code like this:
xActor = record
Id: integer;
Value: integer;
Name: ansistring;
Height: integer;
Gender : Boolean;
Position : byte;
end;
I created this in c# as a struct. However, after this code i saw something like this
PxActor = ^xActor;
I've researched a bit and found out that this means create a type that includes xActor's pointers i may be wrong tho.
Then i saw a more interesting record in the code
yActor = packed record
Actor: array [1..9] of PxActor;
end;
at this point i have no idea how to convert these to c# since i have no idea what PxActor = ^xActor; means.
Thanks in advance
C# has two different kinds of an object - a reference-type, and a value-type.
The code you have is mixing the two (very common in C/Pascal), because you have a C# value-type (the struct xActor) and a reference-type referring to the same type (the pointer PxActor), which isn't quite possible in C#.
However, there are ways to emulate similar behaviour. But first, you have to think about the logical way this is supposed to work. Maybe this is just a performance tweak (e.g. the array of PxActor is used instead of xActor because you want to save up on the extra memory from having the whole xActor structure there). Perhaps it's rather that you want to have the same value available from multiple places, and mutable.
Both cases can usually be solved by simply making xActor a class - in C#, that means that it will always be a reference type (not something that makes sense in C/Pascal, but the predominant object in C# and similar languages). However, you need to manually check every place where anything of type xActor or PxActor is used, and make sure the value/reference semantics are still the same. This can lead to some hard to find bugs, depending on the original code.
Another option is to create a wrapper class. So you'll keep your struct xActor, but you'll also create a class PxActor, which will have a single xActor field. This is not an exact analogue of Pascal's pointers, but if you're careful, it can be used in a similar way. The main difference is that you can't capture a pointer to an existing value somewhere - you can only create a new PxActor, with it's own copy of xActor. Anyone accessing the PxActor instance will also be working with the same xActor instance, but anytime you store the xActor itself, it's a new copy. But in the simplest case, you can replace manual allocations of xActor and a subsequent # into new PxActor(), and any variable^.Gender access to variable.ActorField.Gender.
Just during the refactoring itself, it would also be possible to use C#'s pointers. They are a bit limited compared to C/Pascal's, but they might work for your cases. Except for that string value. However, pointers have lots of costs, so you don't want to use them unless they are a good idea, and this isn't that case :)
In other words, it's usually a bad idea to do a 1:1 translation from Pascal to C#. You need to figure out what the semantics behind all the stuff is. Finding all the usages of a given type is going to be a huge chore. Translating line-by-line can easily paint you into a corner. C# is a very safe language, and CLR is very good at making sure everyone can talk with each other, but this also means a few differences in stuff like value/reference passing and similar. Now, IIRC, Pascal has value semantics by default, so using struct and similar should mostly work well in C# as well. You'll have to analyze any use of pointers, though. Sometimes, you can use C#'s out or ref instead of a pointer - go for that. But the case you have here isn't quite like that - it's a pointer stored in another type, not just passed as an argument. However, that still doesn't tell us much about how this is supposed to work in practice - maybe it's just a way to work around Pascal's limitations; maybe you'd use List<xActor> in C# instead of array of PxActor.
Of course, that's usually the best option. Where possible, use idiomatic C#, instead of just trying to code Pascal in C#. While there are many similarities, there's also a large amount of differences that can bite you, and make your work much harder than necessary. Find the places where List<SomeType> is going to work better than array of PSomeType. Find the places where ref and out can be used instead of PSomeType. Find where struct makes sense, and where class would be better. This applies to many subtle differences that can ruin your day, so you'll probably be finding many issues over time, but it's a journey :)

What is the equivalent of Delphi "ZeroMemory" in C#?

In delphi the "ZeroMemory" procedure, ask for two parameters.
CODE EXAMPLE
procedure ZeroMemory(Destination: Pointer; Length: DWORD);
begin
FillChar(Destination^, Length, 0);
end;
I want make this, or similar in C#... so, what's their equivalent?
thanks in advance!
.NET framework objects are always initialized to a known state
.NET framework value types are automatically 'zeroed' -- which means that the framework guarantees that it is initialized into its natural default value before it returns it to you for use. Things that are made up of value types (e.g. arrays, structs, objects) have their fields similarly initialized.
In general, in .NET all managed objects are initialized to default, and there is never a case when the contents of an object is unpredictable (because it contains data that just happens to be in that particular memory location) as in other unmanaged environments.
Answer: you don't need to do this, as .NET will automatically "zero" the object for you. However, you should know what the default value for each value type is. For example, the default of a bool is false, and the default of an int is zero.
Unmanaged objects
"Zero-ing" a region of memory is usually only necessary in interoping with external, non-managed libraries.
If you have a pinned pointer to a region of memory containing data that you intend to pass to an outside non-managed library (written in C, say), and you want to zero that section of memory, then your pointer most likely points to a byte array and you can use a simple for-loop to zero it.
Off-topic note
On the flip side, if a large object is allocated in .NET, try to reuse it instead of throwing it away and allocating a new one. That's because any new object is automatically "zeroed" by the .NET framework, and for large objects this clearing will cause a hidden performance hit.
You very rarely need unsafe code in C#. Usually only when interacting with native libraries.
The Marshal class as some low level helper functions, but I'm not aware of any that zeros out memory.
Firstly, in .Net (including C#) then value types are zero by default - so this takes away one of the common uses of ZeroMemory.
Secondly, if you want to zero a list of type T then try a method like:
void ZeroMemory<T>(IList<T> destination)
{
for (var i=0;i<destination.Count; i+))
{
destination[i] = default(T);
}
}
If a list isn't available... then I think I'd need to see more of the calling code.
Technically there is the Array.Clear, but it's only for managed arrays. What do you want to do?

What is the underlying reason for not being able to put arrays of pointers in unsafe structs in C#?

If one could put an array of pointers to child structs inside unsafe structs in C# like one could in C, constructing complex data structures without the overhead of having one object per node would be a lot easier and less of a time sink, as well as syntactically cleaner and much more readable.
Is there a deep architectural reason why fixed arrays inside unsafe structs are only allowed to be composed of "value types" and not pointers?
I assume only having explicitly named pointers inside structs must be a deliberate decision to weaken the language, but I can't find any documentation about why this is so, or the reasoning for not allowing pointer arrays inside structs, since I would assume the garbage collector shouldn't care what is going on in structs marked as unsafe.
Digital Mars' D handles structs and pointers elegantly in comparison, and I'm missing not being able to rapidly develop succinct data structures; by making references abstract in C# a lot of power seems to have been removed from the language, even though pointers are still there at least in a marketing sense.
Maybe I'm wrong to expect languages to become more powerful at representing complex data structures efficiently over time.
One very simple reason: dotNET has a compacting garbage collector. It moves things around. So even if you could create arrays like that, you would have to pin every allocated block and you would see the system slow down to a crawl.
But you are trying to optimize based on an assumption. Allocation and cleanup of objects in dotNET is highly optimized. So write a working program first and then use a profiler to find your bottlenecks. It will most likely not be the allocation of your objects.
Edit, to answer the latter part:
Maybe I'm wrong to expect languages to
become more powerful at representing
complex data structures efficiently
over time.
I think C# (or any managed language) is much more powerful at representing
complex data structures (efficiently). By changing from low level pointers to garbage collected references.
I'm just guessing, but it might have to do with different pointer sizes for different target platforms. It seems that the C# compiler is using the size of the elements directly for index calculations (i.e. there is no CLR support for calculating fixed sized buffers indices...)
Anyway you can use an array of ulongs and cast the pointers to it:
unsafe struct s1
{
public int a;
public int b;
}
unsafe struct s
{
public fixed ulong otherStruct[100];
}
unsafe void f() {
var S = new s();
var S1 = new s1();
S.otherStruct[4] = (ulong)&S1;
var S2 = (s1*)S.otherStruct[4];
}
Putting a fixed array of pointers in a struct would quickly make it a bad candidate for a struct. The recommended size limit for a struct is 16 bytes, so on a x64 system you would be able to fit only two pointers in the array, which is pretty pointless.
You should use classes for complex data structures, if you use structures they become very limited in their usage. You wouldn't for example be able to create a data structure in a method and return it, as it would then contain pointers to structs that no longer exists as they were allocated in the stack frame of the method.

Categories

Resources