Serialization / Derialization of a tree structure

Serialization / Derialization of a tree structure - c#

I'm trying to figure out the best way to save (serialize) and later open (deserialize) a tree structure. My structure is made up of various object types with different properties, but each inherits from a base abstract "Node" class.
Each node has unique ID (GUID), and has an AddSuperNode(Node nd) method that will set the parent of a node. This in turn calls other methods that allow the parent node to know what sub nodes it has. However, some nodes also utilize a AddAuxSuperNode() method that adds a secondary parent to the Node.
I was using binary serialization, but now I think I want to use something where I have a bit more control, and the serialized data is more accessible. I also want to retain Type information when I deserialize, and be able to serialize private values. So DataContractSerializer seemed like the best way to go.
I can't just serialize the root Node directly because of nodes having multiple parents. I do not want to create duplicate objects. So it would seem that I need to deconstruct the tree into a flat list, and then serialize that. Then after serializing that list reconstruct the tree. Does this sound right?
Like I said before each Node has a unique GUID identifier, but right now Nodes reference their parents/children directly and do not store their ids. I could update the AddSuperNode() and AddAuxSuperNode() methods to also update a list of parent ids to be serialized in addition to the direct references. But I'd rather only update/create this list when the object is being serialized. So i was thinking create an UpdateSuperNodeIDRefs() method in the node that would be called right before serialization.
The following is what I'm planning to do for serialization and deserialization of this structure. Can anyone suggestion a better/cleaner/more efficient way to do this?
Serialization
1) Provide the root node of the tree structure
2) Break down tree structure into a flat Dictionary(Guid id,Node nd) where id is the guid of nd.
3) Call UpdateSuperNodeIDRefs(); for each node to update the IDs it has saved for its parents.
4) Serialize the Dictionary of nodes with DataContractSerializer
Deserialization
1) Deserialize the Dictionary of nodes
2) Itterate through each Node in the Dictionary, reconnecting each to their parents. For any Parent IDs stored find the respective Node(s) in the Dictionary with matching ID(s) call the AddSuperNode() or AddAuxSuperNode() to re-connnect the node to its parent(s)
3) From any Node in the Dictionary find the root of the structure
4) Return the root Node

If a node has multiple parents, then it isn't a tree; it is, presumably, a graph. However - worry not; DataContractSerializer can handle this for you:
using System;
using System.IO;
using System.Runtime.Serialization;
[DataContract]
class Node {
[DataMember]
public Node AnotherNode { get; set; }
}
static class Program
{
static void Main()
{
Node a = new Node(), b = new Node();
// make it a cyclic graph, to prove reference-mode
a.AnotherNode = b;
b.AnotherNode = a;
// the preserveObjectReferences argument is the interesting one here...
DataContractSerializer dcs = new DataContractSerializer(
typeof(Node), null, int.MaxValue, false, true, null);
using (MemoryStream ms = new MemoryStream())
{
dcs.WriteObject(ms, a);
ms.Position = 0;
Node c = (Node) dcs.ReadObject(ms);
// so .AnotherNode.Another node should be back to "c"
Console.WriteLine(ReferenceEquals(c, c.AnotherNode.AnotherNode));
}
}
}

Related

How is this code working? Here the class name is used as a datatype in its own implementation

public class Node
{
public int Value { get; set; }
public Node Next { get; set; }
}
I am a complete beginner in programming. I decided to learn C# as my first programming language. I came across this code.
How is 'Node' defined as the datatype for Next? It is confusing me a lot.

This is a good question to ask as you're learning about C#. The key is that there are two kinds of types in C#: "value" types and "reference" types. See this question and its answers for more details.
Because Node is declared as a class, that means it is a reference type.
If you make a variable with a reference type, then that variable doesn't hold the data directly; instead, it holds a reference that can point to the data. By default, references have the special value null, which means they don't point to anything. When you assign a variable e.g. myNode.Next = someOtherNode, you don't copy the entirety of someOtherNode to the Next property; you just copy a reference to someOtherNode into the property.
So by the Node class itself having a Node property, a Node object doesn't actually contain another Node object. The first object contains a reference to the second object. This allows one node to point to another node, which can then point to another node, and so on. A collection of nodes organized this way is called a linked list; in this case, it's a linked list of int (32-bit integer) values.
If Node were a value type (declared as a struct instead of a class), then there would indeed be a problem. Value type variables contain the data directly, so you cannot have an instance of a value type which contains another instance of that same value type.

Like other answers, this class represents a node for a linked list. In this case the Node can point to another instance of Node.
Node
{
int Value = 1;
Node Next =============> Node {
} int Value = 2;
Node Next ===========> Node {
} int Value = 3;
Node Next =======> null
}
You don't usually come across classes having references to themselves like Node.

This is called a linked list, an example of a recursive data structure. This can easily be instantiated in C# because the recursion can be ended by letting the last node in the list have a null value for the Next property.

That looks like a class defining a node in a singly linked list that stores integer values. Remember a class is just a definition of the shape of an object. This may work like:
var node1 = new Node { Value=1} ;
node1.Node = new Node { Value=2};

(Manually) serialize and de-serialize a binary search tree

I have implemented binary search tree in C# using standard approach.
The complete code is here
I'm not able to figure out how can I do this using custom approach. How can this be done using C# manually?

I don't see why you wouldn't use some of standard (de)serialization techniques (BinaryFormatter, XmlSerializer, data contracts, protocol buffers)?
But if you really want to use the approach given in the link, the point of the article can be summarized into:
A simple solution is to store both Inorder and Preorder traversals. This solution requires requires space twice the size of Binary Tree.
When represented this way, you have to use a "dummy" value for empty nodes. And since the author of the linked article used the tree to store integers, (s)he chose to use the "special" -1 value for empty nodes.
But if you are not storing the tree this way internally (I presume you are using a linked list), then there is not point in adding these dummy values. If you are storing plain C# objects, than a null value clearly describes an empty node.
If your intention is to port the C++ to C# completely, then the serialization method would look like this:
// This function stores a tree in a file pointed by fp
void Serialize(Node root, StreamWriter writer)
{
// If current node is NULL, store marker
if (root == null)
{
writer.Write("{0} ", MARKER);
return;
}
// Else, store current node and recur for its children
writer.Write("{0} ", root.key);
Serialize(root.leftc, writer);
Serialize(root.rightc, writer);
}
But this is very specific to your tree, as it only works for simple keys (like integers in your case), and it's not very space/speed efficient.

When writing a binary data to a file (or stream), you need to put some "marker" (indicator) for null (in contrast with XML where you have a natural "missing" element/attribute). It could be anything, the most natural would be a bool representing something similar to Nullable<T>.HasValue, but for Node reference, like this
class ObjectPersistence
{
public void StoreBSTToFile(BST bst, string TreeStoreFile)
{
using (var writer = new BinaryWriter(File.Create(TreeStoreFile)))
WriteNode(writer, bst.root);
}
public BST ReadBSTFromFile(string TreeStoreFile)
{
using (var reader = new BinaryReader(File.OpenRead(TreeStoreFile)))
return new BST { root = ReadNode(reader) };
}
private static void WriteNode(BinaryWriter output, Node node)
{
if (node == null)
output.Write(false);
else
{
output.Write(true);
output.Write(node.key);
WriteNode(output, node.leftc);
WriteNode(output, node.rightc);
}
}
private static Node ReadNode(BinaryReader input)
{
if (!input.ReadBoolean()) return null;
var node = new Node();
node.key = input.ReadInt32();
node.leftc = ReadNode(input);
node.rightc = ReadNode(input);
return node;
}
}

What .NET framework collection class is for modeling trees? [duplicate]

I want to store an organisation chart in a collection. I think a tree data structure will be best suited to my needs, as I need to add multiple nodes to one node.
LinkedList only provides adding one node to another node, if I understand it correctly.
I have looked at C5 treeset collection, but it doesn't seem to have Add() method to add more than 2 nodes to one node.
I have also looked at Treeview class from Windows Forms library, but I do not want to add Windows forms dll to my project, since I am building a service layer application. (or is it fine?)
I do not want to write my own tree collection class, if there is already one provided by 3rd party?
Any suggestion please?
Thanks

Something like this can be a starting point. By using generics this one can hold a tree of anything
class TreeNode<T>
{
List<TreeNode<T>> Children = new List<TreeNode<T>>();
T Item {get;set;}
public TreeNode (T item)
{
Item = item;
}
public TreeNode<T> AddChild(T item)
{
TreeNode<T> nodeItem = new TreeNode<T>(item);
Children.Add(nodeItem);
return nodeItem;
}
}
A sample which holds a tree of strings
string root = "root";
TreeNode<string> myTreeRoot = new TreeNode<string>(root);
var first = myTreeRoot.AddChild("first child");
var second = myTreeRoot.AddChild("second child");
var grandChild = first.AddChild("first child's child");

Recursive collection search

I have a collection (List<Element>) of objects as described below:
class Element
{
string Name;
string Value;
ICollection<Element> ChildCollection;
IDictionary<string, string> Attributes;
}
I build a List<Element> collection of Element objects based on some XML that I read in, this I am quite happy with. How to implement searching of these elements currently has me, not stumped, but wondering if there is a better solution.
The structure of the collection looks something like this:
- Element (A)
- Element (A1)
- Element (A1.1)
- Element (A2)
- Element (B)
- Element (B1)
- Element (B1.1)
- Element (B1.2)
- Element (C)
- Element (C1)
- Element (C2)
- Element (C3)
Currently I am using recursion to search the Attributes dictionary of each top level (A, B, C) Element for a particular KeyValuePair. If I do not find it in the top level Element I start searching its ChildElement collection (1, 1.1, 2, 2.1, n, etc.) in the same manner.
What I am curious about is if there is a better method of implementing a search on these objects or if recursion is the better answer in this instance, if I should implement the search as I am currently, top -> child -> child -> etc. or if I should search in some other manner such as all top levels first?
Could I, and would it be reasonable to use the TPL to search each top level (A, B, C) in parallel?

Recursion is one way of implementing a tree search where you visit elements in depth-first order. You can implement the same algorithm with a loop instead of recursion by using a stack data structure to store the nodes of your tree that you need to visit.
If you use the same algorithm with a queue instead of a stack, the search would proceed in breath-first order.
In both cases the general algorithm looks like this:
var nodes = ... // some collection of nodes
nodes.Add(root);
while (nodes.Count != 0) {
var current = nodes.Remove ... // Take the current node from the collection.
foreach (var child in current.ChildCollection) {
nodes.Add(child);
}
// Process the current node
if (current.Attributes ...) {
...
}
}
Note that the algorithm is not recursive: it uses an explicit collection of nodes to save the current state of the search, whereas a recursive implementation uses the call stack for the same purpose. If nodes is a Stack<Element>, the search proceeds in depth-first order; if nodes is a Queue<Element>, the search proceeds in breadth-first order.

I grabbed this bit from SO somewhere, Its not mine but I cant provide a link to it. This class Flattens out a treeview for a recursive search, looks like it should do the same for you.
public static class SOExtension
{
public static IEnumerable<TreeNode> FlattenTree(this TreeView tv)
{
return FlattenTree(tv.Nodes);
}
public static IEnumerable<TreeNode> FlattenTree(this TreeNodeCollection coll)
{
return coll.Cast<TreeNode>()
.Concat(coll.Cast<TreeNode>()
.SelectMany(x => FlattenTree(x.Nodes)));
}
}
I found the link I got this from - its very easy to use. have a look. Is there a method for searching for TreeNode.Text field in TreeView.Nodes collection?

You can re-use existing components designed specifically for traversing in different ways, such as NETFx IEnumerable.Traverse Extension Method. It allows you to depth or breadth first. It lets you traverse an enumerable tree, depth or breadth first.
Example to get a flattened enumerable of directories:
IEnumerable<DirectoryInfo> directories = ... ;
IEnumerable<DirectoryInfo> allDirsFlattened = directories.Traverse(TraverseKind.BreadthFirst, dir => dir.EnumerateDirectories());
foreach (DirectoryInfo directoryInfo in allDirsFlattened)
{
...
}
For BreadhFirst it uses Queue<T> internally and for DepthFirst it uses Stack<T> internally.
It is not traversing nodes parallell and unless the traversal is resource demanding it isn't appropriate to use parallellism at this level. But that depends on the context.

What collection to store a tree structure?

I want to store an organisation chart in a collection. I think a tree data structure will be best suited to my needs, as I need to add multiple nodes to one node.
LinkedList only provides adding one node to another node, if I understand it correctly.
I have looked at C5 treeset collection, but it doesn't seem to have Add() method to add more than 2 nodes to one node.
I have also looked at Treeview class from Windows Forms library, but I do not want to add Windows forms dll to my project, since I am building a service layer application. (or is it fine?)
I do not want to write my own tree collection class, if there is already one provided by 3rd party?
Any suggestion please?
Thanks

Something like this can be a starting point. By using generics this one can hold a tree of anything
class TreeNode<T>
{
List<TreeNode<T>> Children = new List<TreeNode<T>>();
T Item {get;set;}
public TreeNode (T item)
{
Item = item;
}
public TreeNode<T> AddChild(T item)
{
TreeNode<T> nodeItem = new TreeNode<T>(item);
Children.Add(nodeItem);
return nodeItem;
}
}
A sample which holds a tree of strings
string root = "root";
TreeNode<string> myTreeRoot = new TreeNode<string>(root);
var first = myTreeRoot.AddChild("first child");
var second = myTreeRoot.AddChild("second child");
var grandChild = first.AddChild("first child's child");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Serialization / Derialization of a tree structure - c#

Related

How is this code working? Here the class name is used as a datatype in its own implementation

(Manually) serialize and de-serialize a binary search tree

What .NET framework collection class is for modeling trees? [duplicate]

Recursive collection search

What collection to store a tree structure?

Categories

Resources