Huffman Tree: Traversing [closed] - c#

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I'm not sure how I'm going to attack the traversing of my Huffman Tree. The tree is correct, I just have a hard time figuring out how to traverse it in a good way. For some reason, my traversing method gives no result...
UPDATE: Cleaned up the code, made it more Object Oriented
Node class:
public class Node
{
public int frekvens; //Frequency
public char tegn; //Symbol
public Node venstre; //Left child
public Node høyre; //Right child
public string s; //result string
public string resultat;
public Node (char c) // Node constructor containing symbol.
{
frekvens = 1;
tegn = c;
}
public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
{
frekvens = f;
this.venstre = venstre;
this.høyre = høyre;
}
public Node (Node node) // Node constructor containing a node
{
frekvens = node.frekvens;
tegn = node.tegn;
this.venstre = venstre;
this.høyre = høyre;
}
public void ØkMed1() // Inkrement frequency by one
{
frekvens = frekvens + 1;
}
public char getVenstreTegn ()
{
return venstre.tegn;
}
public char getHøyreTegn ()
{
return venstre.tegn;
}
public int getVenstreFrekvens ()
{
return venstre.frekvens;
}
public int getHøyreFrekvens ()
{
return høyre.frekvens;
}
public int getFrekvens()
{
return frekvens;
}
public bool ErTegn(char c)
{
if ( c == tegn)
{
return false;
}
else
{
return true;
}
}
//Pretty sure this does not work as intended
public string traverser (Node n) //Traverse the tree
{
if (n.tegn != '\0') //If the node containes a symbol --> a leaf
{
resultat += s;
}
else
{
if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
{
s += "0";
traverser(n.venstre);
}
if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
{
s += "1";
traverser(n.høyre);
}
}
return resultat;
}
public string Resultat() //Used priviously to check if i got the correct huffman tree
{
string resultat;
resultat = "Tegn: " + Convert.ToString(tegn) +" frekvens: " + Convert.ToString(frekvens) + "\n";
return resultat;
}
}
Huffman_Tree Class:
public class Huffman_Tre
{
string treString;
List<Node> noder = new List<Node>();
public Node rot;
public void bygg (string input)
{
bool funnet; //Found
char karakter; //character
for (int i = 0; i < input.Length;i++) //Loops through string and sets character
//with coresponding freqeuncy in the node list
{
karakter = input[i];
funnet = false; //default
for (int j = 0; j< noder.Count; j++)
{
if (noder[j].ErTegn(karakter) == false) //if the character already exists
{
noder[j].ØkMed1(); //inkrement frequency by one
funnet = true;
break;
}
}
if (!funnet) //if the character does not exist
{
noder.Add(new Node(karakter)); //add the character to list
}
}
//Sorting node list acending by frequency
var sortertListe = noder.OrderBy(c => c.frekvens).ToList();
noder = sortertListe;
do
{
noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));
//Remove the leaf nodes
noder.RemoveAt(0);
noder.RemoveAt(0);
} while(noder.Count >= 2);
}
public Node getRot()
{
return rot;
}
public string visTre()
{
foreach (Node node in noder)
{
treString += node.Resultat();
}
return treString;
}
public bool erNull()
{
if (noder[0].tegn == '\0')
{
return true;
}
else
return false;
}
}
Main Program:
private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
{
string input; //The string input I want to compress
input = txtInput.Text; //initialize input to text input
input = input.ToLower();
txtOutput.Text = "";
Huffman_Tre tre = new Huffman_Tre();
tre.bygg(input);
Node rot = new Node(tre.getRot());
txtOutput.Text += rot.traverser(rot);
}
}

As I had a little bit of time left, I worked out an example of a Huffman tree, while playing with C# 6.0. It's not optimized (not even by far!), but it works fine as an example. And it will help you to look where your 'challenge' may arise. As my English is far better than my Scandinavian knowledge, I used English naming, I hope you don't mind.
First, let's start with the class that keeps the frequencies.
public sealed class HuffmanFrequencyTable
{
#region Properties
/// <summary>
/// Holds the characters and their corresponding frequencies
/// </summary>
public Dictionary<char, int> FrequencyTable { get; set; } = new Dictionary<char, int>();
#endregion
#region Methods
/// <summary>
/// Clears the internal frequency table
/// </summary>
public void Clear()
{
FrequencyTable?.Clear();
}
/// <summary>
/// Accepts and parses a new line (string) which is then
/// merged with the existing dictionary or frequency table
/// </summary>
/// <param name="line">The line to parse</param>
public void Accept(string line)
{
if (!string.IsNullOrEmpty(line))
{
line.GroupBy(ch => ch).
ToDictionary(g => g.Key, g => g.Count()).
ToList().
ForEach(x => FrequencyTable[x.Key] = x.Value);
}
}
/// <summary>
/// Performs a dump of the frequency table, ordering all characters, lowest frequency first.
/// </summary>
/// <returns>The frequency table in the format 'character [frequency]'</returns>
public override string ToString()
{
return FrequencyTable?.PrintFrequencies();
}
#endregion
}
Please note that the ToString() method uses an extension method that is able to 'dump' the contents of the Dictionary used. The extensions is located in a static class called Helpers and looks like this:
/// <summary>
/// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and
/// printing the key and it's value between brackets.
/// </summary>
/// <typeparam name="TKey">Generic key</typeparam>
/// <typeparam name="TValue">Generic value type</typeparam>
/// <param name="dictionary">The dictionary</param>
/// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception>
/// <returns></returns>
public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary)
{
if (dictionary == null)
throw new ArgumentNullException("dictionary");
var items = from kvp in dictionary
orderby kvp.Value
select kvp.Key + " [" + kvp.Value + "]";
return string.Join(Environment.NewLine, items);
}
Now, with this FrequencyTable in place, we can start looking on how to build up the Nodes. Huffman works with a binary tree, so it's best to generate a Node class having a left and right child node. I also took the liberty to perform the traversal algorithm here as well. This class is built up as following:
public sealed class HuffmanNode
{
#region Properties
/// <summary>
/// Holds the left node, if applicable, otherwise null
/// </summary>
public HuffmanNode Left { get; set; } = null;
/// <summary>
/// Holds the right node, if applicable, otherwise null
/// </summary>
public HuffmanNode Right { get; set; } = null;
/// <summary>
/// Holds the Character (or null) for this particular node
/// </summary>
public char? Character { get; set; } = null;
/// <summary>
/// Holds the frequency for this particular node, defaulted to 0
/// </summary>
public int Frequency { get; set; } = default(int);
#endregion
#region Methods
/// <summary>
/// Traverses all nodes recursively returning the binary
/// path for the corresponding character that has been found.
/// </summary>
/// <param name="character">The character to find</param>
/// <param name="data">The datapath (containing '1's and '0's)</param>
/// <returns>The complete binary path for a character within a node</returns>
public List<bool> Traverse(char? character, List<bool> data)
{
//Check the leafs for existing characters
if (null == Left && null == Right)
{
//We're at an endpoint of our 'tree', so return it's data or nothing when the symbol
//characters do not match
return (bool)character?.Equals(Character) ? data : null;
}
else
{
List<bool> left = null;
List<bool> right = null;
//TODO: If possible refactor with proper C# 6.0 features
if (null != Left)
{
List<bool> leftPath = new List<bool>(data);
leftPath.Add(false); //Add a '0'
left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node.
}
if (null != Right)
{
List<bool> rightPath = new List<bool>(data);
rightPath.Add(true); //Add a '1'
right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node
}
return (null != left) ? left : right;
}
}
#endregion
}
I use the Node class within the HuffmanTree class. As, logically, a tree is built up from nodes. The corresponding HuffmanTree is written this way:
public sealed class HuffmanTree
{
#region Fields
/// <summary>
/// Field for keeping the Huffman nodes in. Internally used.
/// </summary>
private List<HuffmanNode> nodes = new List<HuffmanNode>();
#endregion
#region Properties
/// <summary>
/// Holds the Huffman tree
/// </summary>
public HuffmanNode Root { get; set; } = null;
/// <summary>
/// Holds the frequency table for all parsed characters
/// </summary>
public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable()
/// <summary>
/// Holds the amount of bits after encoding the tree.
/// Primary usable for decoding.
/// </summary>
public int BitCountForTree { get; private set; } = default(int);
#endregion
#region Methods
/// <summary>
/// Builds the Huffman tree
/// </summary>
/// <param name="source">The source to build the Hufftree from</param>
/// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception>
public void BuildTree(string source)
{
nodes.Clear(); //As we build a new tree, first make sure it's clean :)
if (string.IsNullOrEmpty(source))
throw new ArgumentNullException("source");
else
{
Frequencies.Accept(source);
foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable)
{
nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value });
}
while (nodes.Count > 1)
{
List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList();
if (orderedNodes.Count >= 2)
{
List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList();
HuffmanNode parent = new HuffmanNode()
{
Character = null,
Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency,
Left = takenNodes[0],
Right = takenNodes[1]
};
//Remove the childnodes from the original node list and add the new parent node
nodes.Remove(takenNodes[0]);
nodes.Remove(takenNodes[1]);
nodes.Add(parent);
}
}
Root = nodes.FirstOrDefault();
}
}
/// <summary>
/// Encodes a given string to the corresponding huffman encoding path
/// </summary>
/// <param name="source">The source to encode</param>
/// <returns>The binary huffman representation of the source</returns>
public BitArray Encode(string source)
{
if (!string.IsNullOrEmpty(source))
{
List<bool> encodedSource = new List<bool>();
//Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source
encodedSource.AddRange(source.SelectMany(character =>
Root.Traverse(character, new List<bool>())
).ToList()
);
//For decoding, we might need the amount of bits to skip trailing bits.
BitCountForTree = encodedSource.Count;
return new BitArray(encodedSource.ToArray());
}
else return null;
}
/// <summary>
/// Decodes a given binary path to represent it's string value
/// </summary>
/// <param name="bits">BitArray for traversing the tree</param>
/// <returns></returns>
public string Decode(BitArray bits)
{
HuffmanNode current = Root;
string decodedString = string.Empty;
foreach (bool bit in bits)
{
//Find the correct current node depending on the bit set or not set.
current = (bit ? current.Right ?? current : current.Left ?? current);
if (current.IsLeaf())
{
decodedString += current.Character;
current = Root;
}
}
return decodedString;
}
#endregion
}
What is interesting in this code, is that I decided to use BitArrays that will hold the binary paths for the tree when it's build up. The public BitArray Encode(string source) method here contains a dirty hack. I keep track of the total amount of bits used for encoding and store this within the BitCountForTree property. When performing a decode, I'll use this property to remove any trailing bits that may arise. There is a way nicer way to perform this, but I'll leave that open for you to find out.
Also, this class makes use of an extension method written for the HuffmanNode. It's a simple one though:
/// <summary>
/// Determines whether a given Huffman node is a leaf or not.
/// A node is considered to be a leaf when it has no childnodes
/// </summary>
/// <param name="node">A huffman node</param>
/// <returns>True if no children are left, false otherwise</returns>
public static bool IsLeaf(this HuffmanNode node)
{
return (null == node.Left && null == node.Right);
}
This extension method is convenient to determine whether or not a given node is actually a leafnode. A leaf is a node which has no childnodes left and thus the end of a binary tree (or better a branch of that tree).
Now the interesting part, how do I make things work here. I have build a Windows Forms application having 3 textboxes. One for the actual input, one for the binary (encoded) output and the last for showing the compressed result.
I also placed two simple buttons, one to perform the Huffman encoding and one for the Huffman decoding.
The Huffman encoding method is written as following (just in the eventhandler of the encode button):
string input = tbInput.Text;
Tree.BuildTree(input); //Build the huffman tree
BitArray encoded = Tree.Encode(input); //Encode the tree
//First show the generated binary output
tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0"));
//Next, convert the binary output to the new characterized output string.
byte[] bytes = new byte[(encoded.Length / 8) + 1];
encoded.CopyTo(bytes, 0);
tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.
Note that the encoded binary string does not have any trailing bits. I'll leave that up to the Encoding mechanisms of C#. The downside of this, is that I have to keep track of it when decoding.
The decoding is not too hard now as well. Although, for this example, I am making use of the compressed output generated by the encoding code placed above. Also, I am assuming that the Huffman tree (and it's frequency table!!!) are already built. Normally, the frequency table is stored within the compressed file, so that it can be rebuild.
//First convert the compressed output to a bit array again again and skip trailing bits.
bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray();
BitArray encoded = new BitArray( boolAr );
string decoded = Tree.Decode(encoded);
MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);
Please pay attention to the dirty hack I created, as the Encoding.Default.GetBytes(tbOutput.Text) surely generates a byte array, it may contain trailing bits which need not to be decoded. Hence that I only take the amount of bits that I will actually need, based upon the rebuild tree.
So when running, my example provides the following output, when using the 'world renown sentence' "The quick brown fox jumps over the lazy programmer":
After pressing the "Huff encode" button:
And after pressing the "Huff decode" button:
Now this code can really use some optimizations, as you might consider using Arrays instead of Dictionaries. There are more, but it's up to you for consideration.

Related

SQL Server CLR function aggregation of sorted string

In order to get a sorted aggregated string, I wrote the CLR function below. However, it always returns empty instead of what I expected, just like "001, 002, 003". I tried to debug the CLR function in visual studio 2017, but threw the error message
The operation could not be completed. Unspecified error
Code:
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined, //use clr serialization to serialize the intermediate result
Name = "CLRSortedCssvAgg", //aggregate name on sql
IsInvariantToNulls = true, //optimizer property
IsInvariantToDuplicates = false, //optimizer property
IsInvariantToOrder = false, //optimizer property
IsNullIfEmpty = false, //optimizer property
MaxByteSize = -1) //maximum size in bytes of persisted value
]
public class SortedCssvConcatenateAgg : IBinarySerialize
{
/// <summary>
/// The variable that holds all the strings to be aggregated.
/// </summary>
List<string> aggregationList;
StringBuilder accumulator;
/// <summary>
/// Separator between concatenated values.
/// </summary>
const string CommaSpaceSeparator = ", ";
/// <summary>
/// Initialize the internal data structures.
/// </summary>
public void Init()
{
accumulator = new StringBuilder();
aggregationList = new List<string>();
}
/// <summary>
/// Accumulate the next value, not if the value is null or empty.
/// </summary>
public void Accumulate(SqlString value)
{
if (value.IsNull || String.IsNullOrEmpty(value.Value))
{
return;
}
aggregationList.Add(value.Value);
}
/// <summary>
/// Merge the partially computed aggregate with this aggregate.
/// </summary>
/// <param name="other"></param>
public void Merge(SortedCssvConcatenateAgg other)
{
aggregationList.AddRange(other.aggregationList);
}
/// <summary>
/// Called at the end of aggregation, to return the results of the aggregation.
/// </summary>
/// <returns></returns>
public SqlString Terminate()
{
if (aggregationList != null && aggregationList.Count > 0)
{
aggregationList.Sort();
accumulator.Append(string.Join(CommaSpaceSeparator, aggregationList));
aggregationList.Clear();
}
return new SqlString(accumulator.ToString());
}
public void Read(BinaryReader r)
{
accumulator = new StringBuilder(r.ReadString());
}
public void Write(BinaryWriter w)
{
w.Write(accumulator.ToString());
}
}
You are close. Just need a few minor adjustments. Do the following and it will work (I tested it):
Remove all references to accumulator. It is not used.
Replace the Terminate(), Read(), and Write() methods with the following:
public SqlString Terminate()
{
string _Aggregation = null;
if (aggregationList != null && aggregationList.Count > 0)
{
aggregationList.Sort();
_Aggregation = string.Join(CommaSpaceSeparator, aggregationList);
}
return new SqlString(_Aggregation);
}
public void Read(BinaryReader r)
{
int _Count = r.ReadInt32();
aggregationList = new List<string>(_Count);
for (int _Index = 0; _Index < _Count; _Index++)
{
aggregationList.Add(r.ReadString());
}
}
public void Write(BinaryWriter w)
{
w.Write(aggregationList.Count);
foreach (string _Item in aggregationList)
{
w.Write(_Item);
}
}
That said, I'm not sure if this approach is faster or slower than the FOR XML approach, but a UDA certainly makes for a more readable query, especially if you need multiple aggregations.
Still, I should mention that starting in SQL Server 2017, this became a built-in function: STRING_AGG (which allows for sorting via the WITHIN GROUP (ORDER BY ... ) clause).
In your Accumulate and Merge, you're dealing with your aggregationList; in Read and Write you're dealing with accumulator. You should pick one or the other for all of them and use it. As I understand it, Read and Write are used when the engine needs to persist temporary results to a work table. For your case, when it does that, it's persisting only your empty StringBuilder.

Is this not an O(n) algorithm?

I'm trying to figure out why my algorithm is passing all the test cases that don't timeout. As far as I can tell, it is an O(n) algorithm since it is the execution of a sequence of O(n) algorithms. That makes it curious to me why it is timing out. I can't think of a way to significantly reduce the number of operations involved here (I think of slight operations by using leaner data structures, but that doesn't reduce the complexity).
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
/// <summary>
///
/// Solution to https://www.hackerrank.com/challenges/cut-the-tree
///
/// Explanation of algorithm:
///
/// Given a tree like
///
/// Val=100
/// \
/// Val=200
/// / \
/// / Val=100
/// Val=100
/// / \
/// Val=500 Val=600
///
/// set a field for each node showing the sum of the values
/// in the subtree whose root is that node, making it into
///
/// Val=100
/// Sum=1600
/// \
/// Val=200
/// Sum=1500
/// / \
/// / Val=100
/// / Sum=100
/// Val=100
/// Sum=1200
/// / \
/// Val=500 Val=600
/// Sum=500 Sum=600
///
/// Then we can easily find minimum difference between the sum of
/// two trees that result from severing a branch: if the root node
/// is R and we sever node N, then the difference between the two
/// sums is |R.Sum - 2 * N.Sum|.
///
/// </summary>
class Node
{
public int Val { get; set; }
public Node Parent { get; set; } = null;
public List<Node> Neighbors { get; set; } = new List<Node>();
/// <summary>
/// Sum of values in descendant nodes
/// </summary>
public int DescendantsSum { get; set; } = 0;
/// <summary>
/// Sum of values in tree whose root is this node
/// </summary>
public int TreeSum { get { return Val + DescendantsSum; } }
}
class Solution
{
/// <summary>
/// Builds the parent relation between nodes
/// Complexity: O(n) where n is the number of nodes
/// </summary>
static Node BuildToTree(Node[] nodes)
{
Node root = nodes[0]; // use arbitrary node as the root
var Q = new Queue<Node>();
Q.Enqueue(root);
while(Q.Count > 0)
{
var current = Q.Dequeue();
foreach(var neighbor in current.Neighbors.Where(nbr => nbr != current.Parent && nbr.Parent == null))
{
neighbor.Parent = current;
Q.Enqueue(neighbor);
}
}
return root;
}
/// <summary>
/// Sets the sums of the descendant trees of each node
/// Complexity: O(n) where n is the number of nodes
/// </summary>
static void SetSums(Node[] nodes)
{
foreach(var node in nodes)
for (var parent = node.Parent; parent != null; parent = parent.Parent)
parent.DescendantsSum += node.Val;
}
/// <summary>
/// Gets the minimum difference between the sum of
/// two trees that result from severing a branch.
/// </summary>
static int MinDiff(Node[] nodes, Node root)
{
return nodes
.Skip(1)
.Min(node => Math.Abs(root.TreeSum - 2 * node.TreeSum));
}
static void Main(String[] args)
{
string curdir = Directory.GetCurrentDirectory();
System.IO.StreamReader file = new System.IO.StreamReader(
Path.GetFullPath(Path.Combine(curdir, #"..\..\", "TestFiles\\SampleInput.txt"))
);
int N = Int32.Parse(file.ReadLine());
int[] vals = Array.ConvertAll(file.ReadLine().Split(' '), Int32.Parse);
Node[] nodes = vals.Select(val => new Node() { Val = val }).ToArray();
for (int i = 0, n = N - 1; i < n; ++i)
{
int[] pair = Array.ConvertAll(file.ReadLine().Split(' '), Int32.Parse);
int p = pair[0] - 1, d = pair[1] - 1;
nodes[p].Neighbors.Add(nodes[d]);
nodes[d].Neighbors.Add(nodes[p]);
}
Node root = BuildToTree(nodes);
SetSums(nodes);
Console.WriteLine(MinDiff(nodes, root));
}
}
Your SetSums() function is O(n^2) (consider a tree with all the nodes linked into a list). You should walk the tree in post-order, or a reverse topological order, and calculate the sum of each parent from the sums of its children.

Go throught the AST tree and get the childnode value Irony

I want to go through the AST tree and get the ChildNode equals "IfStatement" but there could be a lot of Childnodes like in code below. Is there some method to do it in recursive way to add childNodes and check the value ?
MyGrammar grammar = new MyGrammar ();
Parser parser = new Parser(grammar);
var result = parser.Parse(textBox.Text);
var IfNode=result.Root.ChildNodes[0].ChildNodes[0].ChildNodes[1].ChildNodes[0].ToString() == "IfStatement";
I am trying something like this but it doesnt work
var IfCheck = result.Root.ChildNodes.FindAll(x => x.ChildNodes.ToString() == "IfStatement");
You can traverse your tree:
/// <summary>
/// Parser extension methods
/// </summary>
public static class ParserExt
{
/// <summary>
/// Converts parser nodes tree to flat collection
/// </summary>
/// <param name="item"></param>
/// <param name="childSelector"></param>
/// <returns></returns>
public static IEnumerable<ParseTreeNode> Traverse(this ParseTreeNode item, Func<ParseTreeNode, IEnumerable<ParseTreeNode>> childSelector)
{
var stack = new Stack<ParseTreeNode>();
stack.Push(item);
while (stack.Any())
{
var next = stack.Pop();
yield return next;
var childs = childSelector(next).ToList();
for (var childId = childs.Count - 1; childId >= 0; childId--)
{
stack.Push(childs[childId]);
}
}
}
}
Then, just loop through:
var nodes = result.Root.Traverse(node => node.ChildNodes);
var ifStatements = nodes.Where(node => node.Term.Name.Equals("IfStatement"));

How can I convert a list of filenames to a tree structure?

I have a string array of some file paths:
path/to/folder/file.xxx
path/to/other/
path/to/file/file.xx
path/file.x
path/
How can I convert this list to a tree structure? So far I have the following:
/// <summary>
/// Enumerates types of filesystem nodes.
/// </summary>
public enum FilesystemNodeType
{
/// <summary>
/// Indicates that the node is a file.
/// </summary>
File,
/// <summary>
/// Indicates that the node is a folder.
/// </summary>
Folder
}
/// <summary>
/// Represents a file or folder node.
/// </summary>
public class FilesystemNode
{
private readonly ICollection<FilesystemNode> _children;
/// <summary>
/// Initializes a new instance of the <see cref="FilesystemNode"/> class.
/// </summary>
public FilesystemNode()
{
_children = new LinkedList<FilesystemNode>();
}
/// <summary>
/// Gets or sets the name of the file or folder.
/// </summary>
public string Name { get; set; }
/// <summary>
/// Gets or sets the full path to the file or folder from the root.
/// </summary>
public string Path { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the node is a file or folder.
/// </summary>
public FilesystemNodeType Type { get; set; }
/// <summary>
/// Gets a list of child nodes of this node. The node type must be a folder to have children.
/// </summary>
public ICollection<FilesystemNode> Children
{
get
{
if (Type == FilesystemNodeType.Folder)
return _children;
throw new InvalidOperationException("File nodes cannot have children");
}
}
}
I'm just a bit at a loss at how to actually split up the paths and all. Any path that ends with a / is a directory, any one that doesn't, is not.
Also, while my input will always contain a path to the folder, how would I account for that situation if it did not?
For example, if I had the input:
path/to/file.c
path/file.c
path/
How would I account for the fact that path/to/ is not in the input?
Here is a solution that generates a nested dictionary of NodeEntry items (you can substitute your file info class as needed):
public class NodeEntry
{
public NodeEntry()
{
this.Children = new NodeEntryCollection();
}
public string Key { get; set; }
public NodeEntryCollection Children { get; set; }
}
public class NodeEntryCollection : Dictionary<string, NodeEntry>
{
public void AddEntry(string sEntry, int wBegIndex)
{
if (wBegIndex < sEntry.Length)
{
string sKey;
int wEndIndex;
wEndIndex = sEntry.IndexOf("/", wBegIndex);
if (wEndIndex == -1)
{
wEndIndex = sEntry.Length;
}
sKey = sEntry.Substring(wBegIndex, wEndIndex - wBegIndex);
if (!string.IsNullOrEmpty(sKey)) {
NodeEntry oItem;
if (this.ContainsKey(sKey)) {
oItem = this[sKey];
} else {
oItem = new NodeEntry();
oItem.Key = sKey;
this.Add(sKey, oItem);
}
// Now add the rest to the new item's children
oItem.Children.AddEntry(sEntry, wEndIndex + 1);
}
}
}
}
To use the above, create a new collection:
NodeEntryCollection cItems = new NodeEntryCollection();
then, for each line in your list:
cItems.AddEntry(sLine, 0);
I have been inspired from competent_tech's answer and replaced the Dictionary<string, NodeEntry> with a "simple" ObservableCollection<NodeEntry> as the "Key" information would be stored twice in this Dictionary: once as key of the Dictionary and once as public property in the NodeEntry class.
So my sample based on the "competent_tech" code looks like the following:
public class NodeEntryObservableCollection : ObservableCollection<NodeEntry>
{
public const string DefaultSeparator = "/";
public NodeEntryObservableCollection(string separator = DefaultSeparator)
{
Separator = separator; // default separator
}
/// <summary>
/// Gets or sets the separator used to split the hierarchy.
/// </summary>
/// <value>
/// The separator.
/// </value>
public string Separator { get; set; }
public void AddEntry(string entry)
{
AddEntry(entry, 0);
}
/// <summary>
/// Parses and adds the entry to the hierarchy, creating any parent entries as required.
/// </summary>
/// <param name="entry">The entry.</param>
/// <param name="startIndex">The start index.</param>
public void AddEntry(string entry, int startIndex)
{
if (startIndex >= entry.Length)
{
return;
}
var endIndex = entry.IndexOf(Separator, startIndex);
if (endIndex == -1)
{
endIndex = entry.Length;
}
var key = entry.Substring(startIndex, endIndex - startIndex);
if (string.IsNullOrEmpty(key))
{
return;
}
NodeEntry item;
item = this.FirstOrDefault(n => n.Key == key);
if (item == null)
{
item = new NodeEntry(Separator) { Key = key };
Add(item);
}
// Now add the rest to the new item's children
item.Children.AddEntry(entry, endIndex + 1);
}
}
public class NodeEntry
{
public string Key { get; set; }
public NodeEntryObservableCollection Children { get; set; }
public NodeEntry(string separator = NodeEntryObservableCollection.DefaultSeparator)
{
Children = new NodeEntryObservableCollection(separator);
}
}
This helps me in binding the data in a TreeView like this:
<TreeView Name="trvMyTreeView">
<TreeView.ItemTemplate>
<HierarchicalDataTemplate DataType="{x:Type local:NodeEntry}" ItemsSource="{Binding Children}">
<TextBlock Text="{Binding Key}"/>
</HierarchicalDataTemplate>
</TreeView.ItemTemplate>
</TreeView>
With a sample code behind like this:
IList<string> pathes = new List<string>
{
"localhost",
"remotehost.levelD.levelDB",
"localhost.level1.level11",
"localhost.level1",
"remotehost.levelD.levelDA",
"localhost.level2.level22",
"remotehost.levelA",
"remotehost",
"remotehost.levelB",
"remotehost.levelD",
"localhost.level2",
"remotehost.levelC"
};
SortedSet<string> sortedPathes = new SortedSet<string>(pathes);
var obsCollection = new NodeEntryObservableCollection(".");
foreach (var p in sortedPathes) { obsCollection.AddEntry(p); }
trvMyTreeView.ItemsSource = obsCollection;
Split each line by the '/' character. If the string array is of length 5, then the first four items should be directories, and you have to test the last for an extension:
string.IsNullOrEmpty(new FileInfo("test").Extension)
If, like in your case, there is always a '/' even for the last directory, then the last item of the split string array is empty.
The rest is just about traversing your tree. When parsing a item, check if the first directory exists in the Children property of your root node. If it not exists, add it, if it does, use this one and go further.

URL Slugify algorithm in C#?

So I have searched and browsed through the slug tag on SO and only found two compelling solution:
Slugify and Character Transliteration in C#
How to convert super- or subscript to normal text in C#
Which are but partial solution to the problem. I could manually code this up myself but I'm surprised that there isn't already a solution out there yet.
So, is there a slugify alrogithm implementation in C# and/or .NET that properly address latin characters, unicode and various other language issues properly?
http://predicatet.blogspot.com/2009/04/improved-c-slug-generator-or-how-to.html
public static string GenerateSlug(this string phrase)
{
string str = phrase.RemoveAccent().ToLower();
// invalid chars
str = Regex.Replace(str, #"[^a-z0-9\s-]", "");
// convert multiple spaces into one space
str = Regex.Replace(str, #"\s+", " ").Trim();
// cut and trim
str = str.Substring(0, str.Length <= 45 ? str.Length : 45).Trim();
str = Regex.Replace(str, #"\s", "-"); // hyphens
return str;
}
public static string RemoveAccent(this string txt)
{
byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(txt);
return System.Text.Encoding.ASCII.GetString(bytes);
}
Here you find a way to generate url slug in c#. This function remove all accents(Marcel's answer), replace spaces, remove invalid chars, trim dashes from end and replace double occurences of "-" or "_"
Code:
public static string ToUrlSlug(string value){
//First to lower case
value = value.ToLowerInvariant();
//Remove all accents
var bytes = Encoding.GetEncoding("Cyrillic").GetBytes(value);
value = Encoding.ASCII.GetString(bytes);
//Replace spaces
value = Regex.Replace(value, #"\s", "-", RegexOptions.Compiled);
//Remove invalid chars
value = Regex.Replace(value, #"[^a-z0-9\s-_]", "",RegexOptions.Compiled);
//Trim dashes from end
value = value.Trim('-', '_');
//Replace double occurences of - or _
value = Regex.Replace(value, #"([-_]){2,}", "$1", RegexOptions.Compiled);
return value ;
}
Here is my rendition, based Joan's and Marcel's answers. The changes I made are as follows:
Use a widely accepted method to remove accents.
Explicit Regex caching for modest speed improvements.
More word separators recognized and normalized to hyphens.
Here is the code:
public class UrlSlugger
{
// white space, em-dash, en-dash, underscore
static readonly Regex WordDelimiters = new Regex(#"[\s—–_]", RegexOptions.Compiled);
// characters that are not valid
static readonly Regex InvalidChars = new Regex(#"[^a-z0-9\-]", RegexOptions.Compiled);
// multiple hyphens
static readonly Regex MultipleHyphens = new Regex(#"-{2,}", RegexOptions.Compiled);
public static string ToUrlSlug(string value)
{
// convert to lower case
value = value.ToLowerInvariant();
// remove diacritics (accents)
value = RemoveDiacritics(value);
// ensure all word delimiters are hyphens
value = WordDelimiters.Replace(value, "-");
// strip out invalid characters
value = InvalidChars.Replace(value, "");
// replace multiple hyphens (-) with a single hyphen
value = MultipleHyphens.Replace(value, "-");
// trim hyphens (-) from ends
return value.Trim('-');
}
/// See: http://www.siao2.com/2007/05/14/2629747.aspx
private static string RemoveDiacritics(string stIn)
{
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
This still does not solve the non-latin character issue. A completely alternative solution would be to use Uri.EscapeDataString to convert the the string its hex representation:
string original = "测试公司";
// %E6%B5%8B%E8%AF%95%E5%85%AC%E5%8F%B8
string converted = Uri.EscapeDataString(original);
Then use the data to generate a hyperlink:
<a href="http://www.example.com/100/%E6%B5%8B%E8%AF%95%E5%85%AC%E5%8F%B8">
测试公司
</a>
Many browsers will display Chinese characters in the address bar (see below), but based on my limited testing, it is not completely supported.
NOTE: In order for Uri.EscapeDataString to work this way, iriParsing must be enabled.
EDIT
For those looking to generate URL Slugs in C#, I recommend checking out this related question:
How does Stack Overflow generate its SEO-friendly URLs?
It is what I ended up using for my project.
One problem I've had with slugification (new word!) is collisions. If I have a blog post, for instance, called "Stack-Overflow" and one called "Stack Overflow", the slugs of those two titles are the same. Therefore, my slug generator usually has to involve the database in some way. This might be why you don't see more generic solutions out there.
Here is my shot at it. It supports:
removal of diacritics (so we don't just remove "invalid" characters)
max length for the result (or before removal of diacritics - "early truncate")
custom separator between normalized chunks
the result can be forced to uppercase or lowercase
configurable list of supported unicode categories
configurable list of ranges of allowed characters
supports framework 2.0
Code:
/// <summary>
/// Defines a set of utilities for creating slug urls.
/// </summary>
public static class Slug
{
/// <summary>
/// Creates a slug from the specified text.
/// </summary>
/// <param name="text">The text. If null if specified, null will be returned.</param>
/// <returns>
/// A slugged text.
/// </returns>
public static string Create(string text)
{
return Create(text, (SlugOptions)null);
}
/// <summary>
/// Creates a slug from the specified text.
/// </summary>
/// <param name="text">The text. If null if specified, null will be returned.</param>
/// <param name="options">The options. May be null.</param>
/// <returns>A slugged text.</returns>
public static string Create(string text, SlugOptions options)
{
if (text == null)
return null;
if (options == null)
{
options = new SlugOptions();
}
string normalised;
if (options.EarlyTruncate && options.MaximumLength > 0 && text.Length > options.MaximumLength)
{
normalised = text.Substring(0, options.MaximumLength).Normalize(NormalizationForm.FormD);
}
else
{
normalised = text.Normalize(NormalizationForm.FormD);
}
int max = options.MaximumLength > 0 ? Math.Min(normalised.Length, options.MaximumLength) : normalised.Length;
StringBuilder sb = new StringBuilder(max);
for (int i = 0; i < normalised.Length; i++)
{
char c = normalised[i];
UnicodeCategory uc = char.GetUnicodeCategory(c);
if (options.AllowedUnicodeCategories.Contains(uc) && options.IsAllowed(c))
{
switch (uc)
{
case UnicodeCategory.UppercaseLetter:
if (options.ToLower)
{
c = options.Culture != null ? char.ToLower(c, options.Culture) : char.ToLowerInvariant(c);
}
sb.Append(options.Replace(c));
break;
case UnicodeCategory.LowercaseLetter:
if (options.ToUpper)
{
c = options.Culture != null ? char.ToUpper(c, options.Culture) : char.ToUpperInvariant(c);
}
sb.Append(options.Replace(c));
break;
default:
sb.Append(options.Replace(c));
break;
}
}
else if (uc == UnicodeCategory.NonSpacingMark)
{
// don't add a separator
}
else
{
if (options.Separator != null && !EndsWith(sb, options.Separator))
{
sb.Append(options.Separator);
}
}
if (options.MaximumLength > 0 && sb.Length >= options.MaximumLength)
break;
}
string result = sb.ToString();
if (options.MaximumLength > 0 && result.Length > options.MaximumLength)
{
result = result.Substring(0, options.MaximumLength);
}
if (!options.CanEndWithSeparator && options.Separator != null && result.EndsWith(options.Separator))
{
result = result.Substring(0, result.Length - options.Separator.Length);
}
return result.Normalize(NormalizationForm.FormC);
}
private static bool EndsWith(StringBuilder sb, string text)
{
if (sb.Length < text.Length)
return false;
for (int i = 0; i < text.Length; i++)
{
if (sb[sb.Length - 1 - i] != text[text.Length - 1 - i])
return false;
}
return true;
}
}
/// <summary>
/// Defines options for the Slug utility class.
/// </summary>
public class SlugOptions
{
/// <summary>
/// Defines the default maximum length. Currently equal to 80.
/// </summary>
public const int DefaultMaximumLength = 80;
/// <summary>
/// Defines the default separator. Currently equal to "-".
/// </summary>
public const string DefaultSeparator = "-";
private bool _toLower;
private bool _toUpper;
/// <summary>
/// Initializes a new instance of the <see cref="SlugOptions"/> class.
/// </summary>
public SlugOptions()
{
MaximumLength = DefaultMaximumLength;
Separator = DefaultSeparator;
AllowedUnicodeCategories = new List<UnicodeCategory>();
AllowedUnicodeCategories.Add(UnicodeCategory.UppercaseLetter);
AllowedUnicodeCategories.Add(UnicodeCategory.LowercaseLetter);
AllowedUnicodeCategories.Add(UnicodeCategory.DecimalDigitNumber);
AllowedRanges = new List<KeyValuePair<short, short>>();
AllowedRanges.Add(new KeyValuePair<short, short>((short)'a', (short)'z'));
AllowedRanges.Add(new KeyValuePair<short, short>((short)'A', (short)'Z'));
AllowedRanges.Add(new KeyValuePair<short, short>((short)'0', (short)'9'));
}
/// <summary>
/// Gets the allowed unicode categories list.
/// </summary>
/// <value>
/// The allowed unicode categories list.
/// </value>
public virtual IList<UnicodeCategory> AllowedUnicodeCategories { get; private set; }
/// <summary>
/// Gets the allowed ranges list.
/// </summary>
/// <value>
/// The allowed ranges list.
/// </value>
public virtual IList<KeyValuePair<short, short>> AllowedRanges { get; private set; }
/// <summary>
/// Gets or sets the maximum length.
/// </summary>
/// <value>
/// The maximum length.
/// </value>
public virtual int MaximumLength { get; set; }
/// <summary>
/// Gets or sets the separator.
/// </summary>
/// <value>
/// The separator.
/// </value>
public virtual string Separator { get; set; }
/// <summary>
/// Gets or sets the culture for case conversion.
/// </summary>
/// <value>
/// The culture.
/// </value>
public virtual CultureInfo Culture { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the string can end with a separator string.
/// </summary>
/// <value>
/// <c>true</c> if the string can end with a separator string; otherwise, <c>false</c>.
/// </value>
public virtual bool CanEndWithSeparator { get; set; }
/// <summary>
/// Gets or sets a value indicating whether the string is truncated before normalization.
/// </summary>
/// <value>
/// <c>true</c> if the string is truncated before normalization; otherwise, <c>false</c>.
/// </value>
public virtual bool EarlyTruncate { get; set; }
/// <summary>
/// Gets or sets a value indicating whether to lowercase the resulting string.
/// </summary>
/// <value>
/// <c>true</c> if the resulting string must be lowercased; otherwise, <c>false</c>.
/// </value>
public virtual bool ToLower
{
get
{
return _toLower;
}
set
{
_toLower = value;
if (_toLower)
{
_toUpper = false;
}
}
}
/// <summary>
/// Gets or sets a value indicating whether to uppercase the resulting string.
/// </summary>
/// <value>
/// <c>true</c> if the resulting string must be uppercased; otherwise, <c>false</c>.
/// </value>
public virtual bool ToUpper
{
get
{
return _toUpper;
}
set
{
_toUpper = value;
if (_toUpper)
{
_toLower = false;
}
}
}
/// <summary>
/// Determines whether the specified character is allowed.
/// </summary>
/// <param name="character">The character.</param>
/// <returns>true if the character is allowed; false otherwise.</returns>
public virtual bool IsAllowed(char character)
{
foreach (var p in AllowedRanges)
{
if (character >= p.Key && character <= p.Value)
return true;
}
return false;
}
/// <summary>
/// Replaces the specified character by a given string.
/// </summary>
/// <param name="character">The character to replace.</param>
/// <returns>a string.</returns>
public virtual string Replace(char character)
{
return character.ToString();
}
}

Categories

Resources