should a tree node have a pointer to its containing tree? - c#

I'm building a gui component that has a tree-based data model (e.g. folder structure in the file system). so the gui component basically has a collection of trees, which are just Node objects that have a key, reference to a piece of the gui component (so you can assign values to the Node object and it in turn updates the gui), and a collection of Node children.
one thing I'd like to do is be able to set "styles" that apply to each level of nodes (e.g. all top-level nodes are bold, all level-2 nodes are italic, etc). so I added this to the gui component object. to add nodes, you call AddChild on a Node object. I would like to apply the style here, since upon adding the node I know what level the node is.
problem is, the style info is only in the containing object (the gui object), so the Node doesn't know about it. I could add a "pointer" within each Node to the gui object, but that seems somehow wrong...or I could hide the Nodes and make the user only be able to add nodes through the gui object, e.g. gui.AddNode(Node new_node, Node parent), which seems inelegant.
is there a nicer design for this that I'm missing, or are the couple of ways I mentioned not really that bad?

Adding a ParentNode property to each node is "not really that bad". In fact, it's rather common. Apparently you didn't add that property because you didn't need it originally. Now you need it, so you have good reason to add it.
Alternates include:
Writing a function to find the parent of a child, which is processor intensive.
Adding a separate class of some sort which will cache parent-child relationships, which is a total waste of effort and memory.
Essentially, adding that one pointer into an existing class is a choice to use memory to cache the parent value instead of using processor time to find it. That appears to be a good choice in this situation.

It seems to me that the only thing you need is a Level property on the nodes, and use that when rendering a Node through the GUI object.
But it matters whether your Tree elements are Presentation agnostic like XmlNode or GUI oriented like Windows.Forms.TreeNode. The latter has a TreeView property and there is nothing wrong with that.

I see no reason why you should not have a reference to the GUI object in the node. A node cannot exist outside the GUI object, and it is useful to be able to easily find the GUI object a node is contained in.
You may not want to tie the formatting to the level the node is at if your leaf nodes may be at different levels.

Related

How is the Root Node being updated when child nodes are the ones updated when performing Tree modifications?

How does a Tree's parent node get updated with the child updates we perform? Specifically like when you do a BFS or DFS search, perform a check at the node, and then update that node. What causes in memory or in the programming language to know "Oh hey, I have to make this update to the Root node as well!"
In my example I'm using a Trie (not really important but I will just refer to it as a Tree). I have this BFS search below that searches through a bunch of nodes and will update with a specific word value. The comment I have is where my question lies. That child node is currently stored in "deque." How does the program know I mean to update the value within Root, not just the value pass into the variable "deque?"
To me, what I think SHOULD be happening is Root shouldn't be updated and the only thing that gets updated is the variable "deque," and then after it's done, everything gets garbage collected and Root remains the same. Instead, Root gets updated when "deque" gets updated. Perhaps I missed this in my Data Structures course, but it has been bugging me for a while now and I've been having trouble finding resources that explain this.
private static void BFS_UpdateAllWords(Node Root, string testword, string updatevalue)
{
Queue<Node> bfs_queue = new Queue<Node>();
bfs_queue.Enqueue(Root);
while (bfs_queue.Count > 0)
{
var deque = bfs_queue.Dequeue();
foreach (string childKey in deque.Children.Keys)
{ // Update all child nodes at the key
if (deque.Children[childKey].Word.Equals(testword))
{
// This part right here for any time of Tree traversal
deque.Children[childKey].WordType = updatevalue;
}
bfs_queue.Enqueue(deque.Children[childKey]);
}
}
}
I think you are encountering what is roughly a pass by value vs pass by reference issue. When a function is pass by value, the parameter of that function is copied such that the function gets its own distinct version and the caller won't see changes the function makes to the parameter. However, if a function is pass by reference, no copy is made and if the function makes changes to a parameter the caller will see the changes that were made.
C# and Java are a little funny here because they are always pass by value (unless you explicitly tell C# to do otherwise), but they "pass references by value" too. That is, when you pass a function a reference type (like an object or a class), what the function receives is a copy of the reference to it (not a deep copy of the underlying object). This means the underlying object can still be changed inside a function, because that function has its own reference to it.
A consequence of this is that when you pass the Enqueue method a reference type, as your Node objects are, what is stored in the queue are just (copies of) references to the same underlying objects which live outside the queue. That is, the elements of your queue are not deep copies of your tree's nodes: they are references to the same underlying Node objects. When you Deque() into the "deque" variable you're therefore just creating another alias which refers to some node which you might otherwise have been able to reach in a traversal directly from the Root node. This way, when you modify the properties of the "deque" node, you're directly modifying the properties of some Node object as it exists in the tree under the Root node—not a copy of that object with its own address in memory but the same underlying object.
This is why when deque gets garbage collected the changes persist: deque just contained copies of references to the nodes which already existed in the Root tree. There's nothing your program had to figure out to know to make the changes you made via deque also affect nodes in the Root tree. Because deque contained references to those same underlying nodes, changes made through it were already directly modifying those nodes, and so after deque went out of scope the changes naturally persisted.

Linked Lists : When adding a element why is current.Next pointing to the new Node, and why do we overwrite the Current Node

I am a beginner to C# and I am picking it up by solving data structures case scenarios. I need help visualizing what is happening in the following code snippet
public void AddAtLast(object data)
{
Node newNode = new Node();
newNode.Value = data;
current.Next = newNode;
current = newNode;
Count++;
}
What part I have understood
I am aware that a new node is being added at the end of the linked list. Also, the new node is getting its value from the function argument.
What I need help with
I am particularly thinking why current.Next is pointing to newNode, shouldn't it point to NULL since my newNode will be placed at the end of the linked list and so it should point to NULL.
Also, why are we doing current=newNode ?
I understand why count++ is present probably because that want to keep track of position at which the new element is added but correct me if my understanding is wrong with this.
So let's see what's happening line-by-line in the AddAtLast(object data) method of the Linked List Class
Node newNode = new Node();
Create a new Node, this is the AddAtLast methods goal in life
newNode.Value = data;
Assign some data to the Node
current.Next = newNode;
Assign the newNode that was created to Current. This is the Linked part of a Linked List
current = newNode;
Overwrite Current (this must seem strange); I'll explain about this more later.
Count++
Increment the Count of the Linked List, Its nice to know the size of a list, without having to traverse all its elements. This is just a short hand way of always knowing the count.
The first thing you have to remember
Is in C# (and many other languages), objects/Classes are a Reference Type. When you create Current (or any other object/class) you are doing 2 things.
Reserving a physical part of memory and filling it with your new Object
Creating a Reference (aka Address, aka Pointer) to that memory. Think of addresses just like a Post-It-Note to something that exists somewhere in your house.
When you overwrite a reference, you actually don't destroy the memory, just like if you scribbled out the address on a Post-It-Note and wrote something else. Your shoes still live in the cupboard. The only exception to this in .Net is if there are no more references left to you object/class the Garbage Collector (your mum) will come and clean it up and throw it away.
By calling current = newNode; it seems like we just lost overwrote it, and lost all references to that node (we were tracking last time), but we didn't.
The second thing to remember
The Clever-Clogs who invented the Linked List knew we had to keep track of the items somehow, so they envisaged when a Node gets added, somewhere some other node needs to have a Link to it.
This is what this line of code (current.Next = newNode) was all about. Make sure its actually linked in the list. Yeah so we overwrote it, but we now know that while someone else is Referencing the Node its not going to be cleaned up. Additionally, if we want to find it again, all we have to do is find the first Node and traverse the linkages.
Another way of thinking about it
Think of Current as a bucket, in that bucket you have a Node, and on that Node is a piece of paper called next.
Someone hands you a new Node.
You studiously write the name of this new node (that someone gave us) on the Node you currently have in the bucket (the Next/Link Post-It-Note every node has)
You tip the bucket out on the floor and your put your new Node in the bucket.
But you have to remember, the Node you tipped out is still around somewhere (in-fact, there is likely another Node around with its name on it too, just like you wrote your new Nodes new name on it). Although, we cant access them easily, they are still there if we traverse the Linkages
In essence, this is how a Linked List works, its just a bunch of Nodes with other nodes names written on it.
We keep track of the list with tools like Current/Temp, and First/Head (Buckets) in the class that encapsulates this logic. Sometimes we have a Count to make it easier to know how many nodes we are tracking. Though truly, the most important part of a Linked List is the First/Head bucket. Without it we cannot traverse the list.
Current/Temp in your original method just makes us easy for us to find the last node, so you don't have to traverse the list to find it
Example
current is the candidate position for next AddAtLast operation, that is the end node of the linked list.
I understand why count++ is present probably because that want to keep track of >position at which the new element is added but correct me if my understanding >is wrong with this .
For the linked list structure you showed here, while the count is used to keep track of the number of nodes, the current is to keep track of current to-be-add-at-last position(that is old last node in the linked list before adding newNode) to facilitate AddAtLast operation. After adding newNode at old current by AddAtLast method, your current will be moved and refer to updated last node(that is newNode which was added just now).
It looks like you are trying to keep track of current element as if you use a pointer in C for tail.
So, that you can have a reference to the end object reference. That's essentially a property of Type Node.

Why does Roslyn have two versions of syntax per language?

I have been looking at the Roslyn code base and noticed that they have two versions of syntax(One internal and one public). Often these appear to be referred to as "Red" nodes and "Green" nodes. I am wondering if anyone can explain what the reasoning is for having two versions of syntax like this.
From Persistence, Facades and Roslyn’s Red-Green Trees:
The “green” tree is immutable, persistent, has no parent references, is built “bottom-up”, and every node tracks its width but not its absolute position. When an edit happens we rebuild only the portions of the green tree that were affected by the edit, which is typically about O(log n) of the total parse nodes in the tree.
The “red” tree is an immutable facade that is built around the green tree; it is built “top-down” on demand and thrown away on every edit. It computes parent references by manufacturing them on demand as you descend through the tree from the top. It manufactures absolute positions by computing them from the widths, again, as you descend.
You, the consumer of the Roslyn API, only ever see the red tree; the green tree is an implementation detail. (And if you use the debugger to peer into the internal state of a parse node you’ll in fact see that there is a reference to another parse node in there of a different type; that’s the green tree node.)
Incidentally, these are called “red/green trees” because those were the whiteboard marker colours we used to draw the data structure in the design meeting. There’s no other meaning to the colours.

How to refer to children in a tree with millions of nodes

I'm attempting to build a tree, where each node can have an unspecified amount of children nodes. The tree is to have over a million nodes in practice.
I've managed to contruct the tree, however I'm experiencing memory errors due to a full heap when I fill the tree with a few thousand nodes. The reason for this is because I'm attempting to store each node's children in a Dictionary data structure (or any data structure for that matter). Thus, at run-time I've got thousands of such data structures being created since each node can have an unspecified amount of children, and each node's children are to be stored in this data structure.
Is there another way of doing this? I cannot simply use a variable to store a reference of the children, as there can be an unspecified amount of children for each node. THus, it is not like a binary tree where I could have 2 variables keeping track of the left child and right child respectively.
Please no suggestions for another method of doing this. I've got my reasons for needing to create this tree, and unfortunately I cannot do otherwise.
Thanks!
How many of your nodes will be "leaf" nodes? Perhaps only create the data structure to store children when you first have a child, otherwise keeping a null reference.
Unless you need to look up the children as a map, I'd use a List<T> (initialized with an appropriate capacity) instead of a Dictionary<,> for the children. It sounds like you may have more requirements than you've explained though, which makes it hard to say.
I'm surprised you're failing after only a few thousand nodes though - you should be able to create a pretty large number of objects before having problems.
I'd also suggest that if you think you'll end up using a lot of memory, make sure you're on a 64-bit machine and make sure your application itself is set to be 64-bit. (That may just be a thin wrapper over a class library, which is fine so long as the class library is set to be 64-bit or AnyCPU.)

Visitor's nodes not suitable to be replaced by visitor?

In my small compiler I currently have a hand-made AST.
I was considering the idea of having a visitor that would look after nodes of a certain type X and would replace them by nodes of type X'. The trouble is that it seems that it isn't something easy to implement with the visitor pattern.
The only way I can see to make this work would be to have visit() methods to all kinds of nodes that could possibility have a node of type X as child and put my node replacing logic there, but there may be lots of those nodes. Plus, if I later decide to add a new kind of node, I incur the risk of not remembering to check for that new special case in this visitor.
What's the problem I'm trying to solve:
For the current case I have in my tree nodes of type FunctionCall that convey only the name of a operation as well as its parameters.
I'd like to substitute those with a MethodInvocation, with the appropriate OOish transformation:
m(A, B) -> A.m(B)
m(n(A, B), C) -> (A.n(B)).m(C)
Of course this can be done in a thousand of different ways, being the easiest one to simply try to consider only a Call class in which there may or may not exist a target, but I'd like to be as explicit as possible (that is, using different kinds of nodes), to express different things, if possible.

Categories

Resources