(Manually) serialize and de-serialize a binary search tree

(Manually) serialize and de-serialize a binary search tree - c#

I have implemented binary search tree in C# using standard approach.
The complete code is here
I'm not able to figure out how can I do this using custom approach. How can this be done using C# manually?

I don't see why you wouldn't use some of standard (de)serialization techniques (BinaryFormatter, XmlSerializer, data contracts, protocol buffers)?
But if you really want to use the approach given in the link, the point of the article can be summarized into:
A simple solution is to store both Inorder and Preorder traversals. This solution requires requires space twice the size of Binary Tree.
When represented this way, you have to use a "dummy" value for empty nodes. And since the author of the linked article used the tree to store integers, (s)he chose to use the "special" -1 value for empty nodes.
But if you are not storing the tree this way internally (I presume you are using a linked list), then there is not point in adding these dummy values. If you are storing plain C# objects, than a null value clearly describes an empty node.
If your intention is to port the C++ to C# completely, then the serialization method would look like this:
// This function stores a tree in a file pointed by fp
void Serialize(Node root, StreamWriter writer)
{
// If current node is NULL, store marker
if (root == null)
{
writer.Write("{0} ", MARKER);
return;
}
// Else, store current node and recur for its children
writer.Write("{0} ", root.key);
Serialize(root.leftc, writer);
Serialize(root.rightc, writer);
}
But this is very specific to your tree, as it only works for simple keys (like integers in your case), and it's not very space/speed efficient.

When writing a binary data to a file (or stream), you need to put some "marker" (indicator) for null (in contrast with XML where you have a natural "missing" element/attribute). It could be anything, the most natural would be a bool representing something similar to Nullable<T>.HasValue, but for Node reference, like this
class ObjectPersistence
{
public void StoreBSTToFile(BST bst, string TreeStoreFile)
{
using (var writer = new BinaryWriter(File.Create(TreeStoreFile)))
WriteNode(writer, bst.root);
}
public BST ReadBSTFromFile(string TreeStoreFile)
{
using (var reader = new BinaryReader(File.OpenRead(TreeStoreFile)))
return new BST { root = ReadNode(reader) };
}
private static void WriteNode(BinaryWriter output, Node node)
{
if (node == null)
output.Write(false);
else
{
output.Write(true);
output.Write(node.key);
WriteNode(output, node.leftc);
WriteNode(output, node.rightc);
}
}
private static Node ReadNode(BinaryReader input)
{
if (!input.ReadBoolean()) return null;
var node = new Node();
node.key = input.ReadInt32();
node.leftc = ReadNode(input);
node.rightc = ReadNode(input);
return node;
}
}

Related

Is there a simpler way that to skip from node explored in tree? [C#]

I have a tree that can have the same node.
If it found the node explored, then I want to skip the node.
Of course, this is a simple topic but I am curious if there is a simpler way.
The code that comes to mind is something like this:
void Explore(Tree tree, HashSet<Tree> exploredTrees)
{
if (exploredTrees.Contains(tree))
continue;
foreach(var childTree in tree.ChildTree)
{
Explore(childTree);
exploredTrees.Add(childTree);
}
}
void static Program()
{
// it assumes there is data in the tree.
Tree tree = new Tree();
Explore(tree, new HashSet());
}
I've been using the code above so far but the second parameter getting on my nerves (new HashSet() for the above example).
As you know to achieve this purpose it must need a data structure to store the data explored.
However, I'm not satisfied because the data structure has to pass from the external. (ex: Explore(tree, new HashSet()))
Is there a way to achieve this purpose without the second parameter in C#?
I don't want to use static keyword because it has to remember to clear the data structure at external.
Thank you for reading.

The general approach is correct, but you could simply add a helper method that creates the hashSet for you: void Explore(Tree tree) => Explore(tree, new HashSet<Tree>())
Or you could use an iterative solution that lets you keep the HashSet as a local variable:
public static IEnumerable<T> DepthFirstLoopSafe<T>(T self, Func<T, IEnumerable<T>> selector, IEqualityComparer<T> equalityComparer = default)
{
var stack = new Stack<T>();
var visited = new HashSet<T>(equalityComparer ?? EqualityComparer<T>.Default);
stack.Push(self);
while (stack.Count > 0)
{
var current = stack.Pop();
visited.Add(current);
yield return current;
foreach (var child in selector(current))
{
if (!visited.Contains(child))
{
stack.Push(child);
}
}
}
}
Called like DepthFirstLoopSafe(tree, t => t.ChildTree). I like to use generics to describe the iteration of trees, since it allows reuse of code for all kinds of trees, regardless of type or how the tree is described.

A special C# Tree algorithm in Umbraco CMS

I'm creating a special tree algorithm and I need a bit of help with the code that I currently have, but before you take a look on it please let me explain what it really is meant to do.
I have a tree structure and I'm interacting with a node (any of the nodes in the tree(these nodes are Umbraco CMS classes)) so upon interaction I render the tree up to the top (to the root) and obtain these values in a global collection (List<Node> in this particular case). So far, it's ok, but then upon other interaction with another node I must check the list if it already contains the parents of the clicked node if it does contain every parent and it doesn't contain this node then the interaction is on the lowest level (I hope you are still with me?).
Unfortunately calling the Contains() function in Umbraco CMS doesn't check if the list already contains the values which makes the list add the same values all over again even through I added the Contains() function for the check.
Can anyone give me hand here if he has already met such a problem? I exchanged the Contains() function for the Except and Union functions, and they yield the same result - they do contain duplicates.
var currentValue = (string)CurrentPage.technologies;
List<Node> globalNodeList = new List<Node>();
string[] result = currentValue.Split(',');
foreach (var item in result)
{
var node = new Node(int.Parse(item));
if (globalNodeList.Count > 0)
{
List<Node> nodeParents = new List<Node>();
if (node.Parent != null)
{
while (node != null)
{
if (!nodeParents.Contains(node))
{
nodeParents.Add(node);
}
node = (Node)node.Parent;
}
}
else { globalNodeList.Add(node); }
if (nodeParents.Count > 0)
{
var differences = globalNodeList.Except<Node>(globalNodeList);
globalNodeList = globalNodeList.Union<Node>(differences).ToList<Node>();
}
}
else
{
if (node.Parent != null)
{
while (node != null)
{
globalNodeList.Add(node);
node = (Node)node.Parent;
}
}
else
{
globalNodeList.Add(node);
}
}
}
}

If I understand your question, you only want to see if a particular node is an ancestor of an other node. If so, just (string) check the Path property of the node. The path property is a comma separated string. No need to build the list yourself.
Just myNode.Path.Contains(",1001") will work.
Small remarks.
If you are using Umbraco 6, use the IPublishedContent instead of Node.
If you would build a list like you do, I would rather take you can provide the Umbraco helper with multiple Id's and let umbraco build the list (from cache).
For the second remark, you are able to do this:
var myList = Umbraco.Content(1001,1002,1003);
or with a array/list
var myList = Umbraco.Content(someNode.Path.Split(','));
and because you are crawling up to the root, you might need to add a .Reverse()
More information about the UmbracoHelper can be found in the documentation: http://our.umbraco.org/documentation/Reference/Querying/UmbracoHelper/
If you are using Umbraco 4 you can use #Library.NodesById(...)

Most appropriate way to construct a File and Directory class in order to easily filter results when placing them on a tree

I am creating a program that cursively finds all the files and directories in the specified path. So one node may have other nodes if that node happens to be a directory.
Here is my Node class:
class Node
{
public List<Node> Children = new List<Node>(); // if node is directory then children will be the files and directories in this direcotry
public FileSystemInfo Value { get; set; } // can eather be a FileInfo or DirectoryInfo
public bool IsDirectory
{
get{ return Value is DirectoryInfo;}
}
public long Size // HERE IS WHERE I AM HAVING PROBLEMS! I NEED TO RETRIEVE THE
{ // SIZE OF DIRECTORIES AS WELL AS FOR FILES.
get
{
long sum = 0;
if (Value is FileInfo)
sum += ((FileInfo)Value).Length;
else
sum += Children.Sum(x => x.Size);
return sum;
}
}
// this is the method I use to filter results in the tree
public Node Search(Func<Node, bool> predicate)
{
// if node is a leaf
if(this.Children.Count==0)
{
if (predicate(this))
return this;
else
return null;
}
else // Otherwise if node is not a leaf
{
var results = Children.Select(i => i.Search(predicate)).Where(i => i != null).ToList();
if (results.Any()) // THIS IS HOW REMOVE AND RECUNSTRUCT THE TREE WITH A FILTER
{
var result = (Node)MemberwiseClone();
result.Children = results;
return result;
}
return null;
}
}
}
and thanks to that node class I am able to display the tree as:
In one column I display the name of the directory or file and on the right the size. The size is formated as currency just because the commas help visualize it more clearly.
So now my problem is The reason why I have this program was to perform some advance searches. So I may only want to search for files that have the ".txt" extension for example. If I perform that filter on my tree I will get:
(note that I compile the text to a function that takes a Node and returns a bool and I pass that method to the Search method on my Node class in order to filter results. More information on how to dynamically compile code can be found at: http://www.codeproject.com/Articles/10324/Compiling-code-during-runtime) Anyways that has nothing to do with this question. The important part was that I removed all the nodes that did not matched that criteria and because I removed those nodes now the sizes of the directories changed!!!
So my question is how will I be able to filter results maintaining the real size of the directory. I guess I will have to remove the property Size and replace it with a field. The problem with that is that every time I add to the tree I will have to update the size of all the parent directories and that gets complex. Before starting coding it that way I will appreciate your opinion on how I should start implementing the class.

Since you're using recursion and your weight is a node-level property you can't expect that will continue to sum even after you remove the node. You either promote it to a upper level (collection) or use an external counter within the recursion (which counts but not depending on filter, you'll need to carry this through the recuersion).
Anyway, why are you implementing a core .NET functionality again? any reason beyond filtering or recursive search? both are pretty well implemented in the BCL.

Removing default namespace attributes in XML with C# - can't pass object by ref and then iterate

I'm currently working on a buggy bit of code that's designed to strip out all the namespaces from an XML document and re-add them in the header. We use it because we ingest very large xml documents and then re-serve them in small fragments, so each item needs to replicate the namespaces in the parent document.
The XML is first loaded ias an XmlDocument and then passed to a function that removes the namespaces:
_fullXml = new XmlDocument();
_fullXml.LoadXml(itemXml);
RemoveNamespaceAttributes(_fullXml.DocumentElement);
The remove function iterates through the whole documents looking for namespaces and removing them. It looks like this:
private void RemoveNamespaceAttributes(XmlNode node){
if (node.Attributes != null)
{
for (int i = node.Attributes.Count - 1; i >= 0; i--)
{
if (node.Attributes[i].Name.Contains(':') || node.Attributes[i].Name == "xmlns")
node.Attributes.Remove(node.Attributes[i]);
}
}
foreach (XmlNode n in node.ChildNodes)
{
RemoveNamespaceAttributes(n);
}
}
However, I've discovered that it doesn't work - it leaves all the namespaces intact.
If you iterate through the code with the debugger then it looks to be doing what it's supposed to - the nodes objects have their namespace attributes removed. But the original _fullXml document remains untouched. I assume this is because the function is looking at a clone of the data passed to it, rather than the original data.
So my first thought was to pass it by ref. But I can't do that because the iterative part of the function inside the foreach loop has a compile error - you can't pass the object n by reference.
Second thought was to pass the whole _fullXml document but that doesn't work either, guessing because it's still a clone.
So it looks like I need to solve the problem of passing the document by ref and then iterating through the nodes to remove all namespaces. This will require re-designing this code fragment obviously, but I can't see a good way to do it. Can anyone help?
Cheers,
Matt

To strip namespaces it could be done like this:
void StripNamespaces(XElement input, XElement output)
{
foreach (XElement child in input.Elements())
{
XElement clone = new XElement(child.Name.LocalName);
output.Add(clone);
StripNamespaces(child, clone);
}
foreach (XAttribute attr in input.Attributes())
{
try
{
output.Add(new XAttribute(attr.Name.LocalName, attr.Value));
}
catch (Exception e)
{
// Decide how to handle duplicate attributes
//if(e.Message.StartsWith("Duplicate attribute"))
//output.Add(new XAttribute(attr.Name.LocalName, attr.Value));
}
}
}
You can call it like so:
XElement result = new XElement("root");
StripNamespaces(NamespaceXml, result);

I'm not 100% sure there aren't failure cases with this but it occurs to me that you can do
string x = Regex.Replace(xml, #"(xmlns:?|xsi:?)(.*?)=""(.*?)""", "");
on the raw xml to get rid of namespaces.
It's probably not the best way to solve this but I thought I'd put it out there.

Serialization / Derialization of a tree structure

I'm trying to figure out the best way to save (serialize) and later open (deserialize) a tree structure. My structure is made up of various object types with different properties, but each inherits from a base abstract "Node" class.
Each node has unique ID (GUID), and has an AddSuperNode(Node nd) method that will set the parent of a node. This in turn calls other methods that allow the parent node to know what sub nodes it has. However, some nodes also utilize a AddAuxSuperNode() method that adds a secondary parent to the Node.
I was using binary serialization, but now I think I want to use something where I have a bit more control, and the serialized data is more accessible. I also want to retain Type information when I deserialize, and be able to serialize private values. So DataContractSerializer seemed like the best way to go.
I can't just serialize the root Node directly because of nodes having multiple parents. I do not want to create duplicate objects. So it would seem that I need to deconstruct the tree into a flat list, and then serialize that. Then after serializing that list reconstruct the tree. Does this sound right?
Like I said before each Node has a unique GUID identifier, but right now Nodes reference their parents/children directly and do not store their ids. I could update the AddSuperNode() and AddAuxSuperNode() methods to also update a list of parent ids to be serialized in addition to the direct references. But I'd rather only update/create this list when the object is being serialized. So i was thinking create an UpdateSuperNodeIDRefs() method in the node that would be called right before serialization.
The following is what I'm planning to do for serialization and deserialization of this structure. Can anyone suggestion a better/cleaner/more efficient way to do this?
Serialization
1) Provide the root node of the tree structure
2) Break down tree structure into a flat Dictionary(Guid id,Node nd) where id is the guid of nd.
3) Call UpdateSuperNodeIDRefs(); for each node to update the IDs it has saved for its parents.
4) Serialize the Dictionary of nodes with DataContractSerializer
Deserialization
1) Deserialize the Dictionary of nodes
2) Itterate through each Node in the Dictionary, reconnecting each to their parents. For any Parent IDs stored find the respective Node(s) in the Dictionary with matching ID(s) call the AddSuperNode() or AddAuxSuperNode() to re-connnect the node to its parent(s)
3) From any Node in the Dictionary find the root of the structure
4) Return the root Node

If a node has multiple parents, then it isn't a tree; it is, presumably, a graph. However - worry not; DataContractSerializer can handle this for you:
using System;
using System.IO;
using System.Runtime.Serialization;
[DataContract]
class Node {
[DataMember]
public Node AnotherNode { get; set; }
}
static class Program
{
static void Main()
{
Node a = new Node(), b = new Node();
// make it a cyclic graph, to prove reference-mode
a.AnotherNode = b;
b.AnotherNode = a;
// the preserveObjectReferences argument is the interesting one here...
DataContractSerializer dcs = new DataContractSerializer(
typeof(Node), null, int.MaxValue, false, true, null);
using (MemoryStream ms = new MemoryStream())
{
dcs.WriteObject(ms, a);
ms.Position = 0;
Node c = (Node) dcs.ReadObject(ms);
// so .AnotherNode.Another node should be back to "c"
Console.WriteLine(ReferenceEquals(c, c.AnotherNode.AnotherNode));
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

(Manually) serialize and de-serialize a binary search tree - c#

I have implemented binary search tree in C# using standard approach. The complete code is here I'm not able to figure out how can I do this using custom approach. How can this be done using C# manually?

Related

Is there a simpler way that to skip from node explored in tree? [C#]

A special C# Tree algorithm in Umbraco CMS

Most appropriate way to construct a File and Directory class in order to easily filter results when placing them on a tree

Removing default namespace attributes in XML with C# - can't pass object by ref and then iterate

Serialization / Derialization of a tree structure

Categories

Resources