I have a small problem and I would like to get your opinion.
I'm dealing with documents than can reference other documents. Starting from any document, I need to get the id of all the documents this document references. The problem is that the circular references are allowed so if A ref B ref C, again C can ref A and I get in the loop.
How can I solve this problem in C#?
An small example:
Let suppose that this is a class that represents a document:
public class Document
{
public Document(int id)
{
this.ID = id;
}
private int m_ID;
public int ID
{
get { return m_ID; }
set { m_ID = value; }
}
private List<Document> m_Children = new List<Document>();
public List<Document> Children
{
get { return m_Children; }
set { m_Children = value; }
}
private List<Document> m_Parent = new List<Document>();
public List<Document> Parent
{
get { return m_Parent; }
set { m_Parent = value; }
}
public Document AddChild(Document child)
{
child.Parent.Add(this);
this.Children.Add(child);
return child;
}
public Document AddChild(int child)
{
Document d = new Document(child);
return AddChild(d);
}
}
Now let's create a Document class that has some references:
public static Document CreateReferences()
{
Document d = new Document(1);
Document temp = d.AddChild(2);
for (int i = 3; i < 6; i++)
{
temp = temp.AddChild(i);
}
temp.AddChild(d);
return d;
}
Now I need to implement a method in Document class like
public List<int> GetReferencedDocuments()
{ }
What is the best way to do that? Any specific algorithm can be implemented?
Any suggestion is well accepted!
Thanks
Any tree-traversal algorithm would be fine.
As well as a list of docs you're going to build up, maintain a queue of documents you've yet to check, add the first document to that list.
Then, while the queue isn't empty, get the next doc, if it's not already in your list, then add it, and add all referenced docs to your queue.
List<Document> FoundDocs = new List<Documents();
Queue<Document> DocsToSearch = new Queue<Document>();
DocsToSearch.Enqueue(StartDoc);
while(DocsToSearch.Count != 0)
{
Document Doc = DocsToSearch.Dequeue();
if(!FoundDocs.Contains(Doc))
{
FoundDocs.Add(Doc);
foreach(var ChildDoc in Doc.Children)
{
DocsToSearch.Enqueue(ChildDoc);
}
}
}
The best way is to do a depth first search or a breadth first search
There are two main approaches to resolving this sort of recursive search on recursive data: marking or recording.
Marking: every time you list a document, flag it as viewed. Do not process flagged documents.
So your GetReferenceDocuments would look a little like this:
GetReferencedDocuments(startpoint)
if(startpoint.flagged) return null
startpoint.flag
new list result =startpoint
foreach(subdocument in
documents.children)
result.append(getreferenceddocuments(subdocuments))//
if not null
Recording: a similar approach, but the flag indicators are replaced by a list of already referenced documents ( a separate list of ids maybe ), and the flag check is a search on this list for this document.
Either way will work, depending on your objects, size and scale. If you cannot change the document objects, you will have to list them. If you have, potentially, 1M documents in your scan, you do not want to list them.
Example implementation:
public List<int> GetReferencedDocuments()
{
var referencedIds = new List<int>();
var queue = new Queue<Document>(this);
while (queue.Count > 0)
{
var newDocuments = queue.Dequeue().Children
.Where(d => !referencedIds.Contains(d.ID))
foreach (Document newDocument in newDocuments)
{
queue.Enqueue(newDocument);
referencedIds.Add(newDocument.ID);
}
}
return referencedIds;
}
Related
(This problem is a adaptation of a real life scenario, I reduced the problem so it is easy to understand, otherwise this question would be 10000 lines long)
I have a pipe delimited text file that looks like this (the header is not in the file):
Id|TotalAmount|Reference
1|10000
2|50000
3|5000|1
4|5000|1
5|10000|2
6|10000|2
7|500|9
8|500|9
9|1000
The reference is optional and is the Id of another entry in this text file. The entries that have a reference, are considered "children" of that reference, and the reference is their parent. I need to validate each parent in the file, and the validation is that the sum of TotalAmount of it's children should be equal to the parent's total amount. The parents can be either first or before their children in the file, like the entry with Id 9, that comes after it's children
In the provided file, the entry with Id 1 is valid, because the sum of the total amount of it's children (Ids 3 and 4) is 10000 and the entry with Id 2 is invalid, because the sum of it's children (Ids 5 and 6) is 20000.
For a small file like this, I could just parse everything to objects like this (pseudo code, I don't have a way to run it now):
class Entry
{
public int Id { get; set; }
public int TotalAmout { get; set; }
public int Reference { get; set; }
}
class Validator
{
public void Validate()
{
List<Entry> entries = GetEntriesFromFile(#"C:\entries.txt");
foreach (var entry in entries)
{
var children = entries.Where(e => e.Reference == entry.Id).ToList();
if (children.Count > 0)
{
var sum = children.Sum(e => e.TotalAmout);
if (sum == entry.TotalAmout)
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
else
{
Console.WriteLine("Entry with Id {0} is INVALID", entry.Id);
}
}
else
{
Console.WriteLine("Entry with Id {0} is valid", entry.Id);
}
}
}
public List<Entry> GetEntriesFromFile(string file)
{
var entries = new List<Entry>();
using (var r = new StreamReader(file))
{
while (!r.EndOfStream)
{
var line = r.ReadLine();
var splited = line.Split('|');
var entry = new Entry();
entry.Id = int.Parse(splited[0]);
entry.TotalAmout = int.Parse(splited[1]);
if (splited.Length == 3)
{
entry.Reference = int.Parse(splited[2]);
}
entries.Add(entry);
}
}
return entries;
}
}
The problem is that I am dealing with large files (10 GB), and that would load way to many objects in memory.
Performance itself is NOT a concern here. I know that I could use dictionaries instead of the Where() method for example. My only problem now is performing the validation without loading everything to memory, and I don't have any idea how to do it, because a entry at the bottom of the file may have a reference to the entry at the top, so I need to keep track of everything.
So my question is: it is possible to keep track of each line in a text file without loading it's information into memory?
Since performance is not an issue here, I would approach this in the following way:
First, I would sort the file so all the parents go right before their children. There are classical methods for sorting huge external data, see https://en.wikipedia.org/wiki/External_sorting
After that, the task becomes pretty trivial: read a parent data, remember it, read and sum children data one by one, compare, repeat.
All you really need to keep in memory is the expected total for each non-child entity, and the running sum of the child totals for each parent entity. Everything else you can throw out, and if you use the File.ReadLines API, you can stream over the file and 'forget' each line once you've processed it. Since the lines are read on demand, you don't have to keep the entire file in memory.
public class Entry
{
public int Id { get; set; }
public int TotalAmount { get; set; }
public int? Reference { get; set; }
}
public static class EntryValidator
{
public static void Validate(string file)
{
var entries = GetEntriesFromFile(file);
var childAmounts = new Dictionary<int, int>();
var nonChildAmounts = new Dictionary<int, int>();
foreach (var e in entries)
{
if (e.Reference is int p)
childAmounts.AddOrUpdate(p, e.TotalAmount, (_, n) => n + e.TotalAmount);
else
nonChildAmounts[e.Id] = e.TotalAmount;
}
foreach (var id in nonChildAmounts.Keys)
{
var expectedTotal = nonChildAmounts[id];
if (childAmounts.TryGetValue(id, out var childTotal) &&
childTotal != expectedTotal)
{
Console.WriteLine($"Entry with Id {id} is INVALID");
}
else
{
Console.WriteLine($"Entry with Id {id} is valid");
}
}
}
private static IEnumerable<Entry> GetEntriesFromFile(string file)
{
foreach (var line in File.ReadLines(file))
yield return GetEntryFromLine(line);
}
private static Entry GetEntryFromLine(string line)
{
var parts = line.Split('|');
var entry = new Entry
{
Id = int.Parse(parts[0]),
TotalAmount = int.Parse(parts[1])
};
if (parts.Length == 3)
entry.Reference = int.Parse(parts[2]);
return entry;
}
}
This uses a nifty extension method for IDictionary<K, V>:
public static class DictionaryExtensions
{
public static TValue AddOrUpdate<TKey, TValue>(
this IDictionary<TKey, TValue> dictionary,
TKey key,
TValue addValue,
Func<TKey, TValue, TValue> updateCallback)
{
if (dictionary == null)
throw new ArgumentNullException(nameof(dictionary));
if (updateCallback == null)
throw new ArgumentNullException(nameof(updateCallback));
if (dictionary.TryGetValue(key, out var value))
value = updateCallback(key, value);
else
value = addValue;
dictionary[key] = value;
return value;
}
}
I have a project at my university and I stumbled upon a problem I am not able to solve.
About the program: I need to create a list of tasks(they can be private or business tasks). I need a function that returns a list of ONLY private tasks and another function that returns a list of ONLY business tasks.
So I have a class "Task" that contains "next" and "prev" connections. The classes "PrivateTask" and "BusinessTask" inherit this class. I also have a class ToDoList where I actually try to create the list.
class ToDoList
{
Task first = null;
Task last = null;
//adds new tasks and sorts them right away
public void AddSorted(Task newTask)
{
if(first == null)
{
first = newTask;
last = newTask;
}
else
{
if(newTask < first)
{
Prepend(newTask);
}
else if(newTask > last)
{
Append(newTask);
}
else
{
Task loopTask = first;
while(newTask > loopTask)
{
loopTask = loopTask.next;
}
AddBefore(loopTask, newTask);
}
}
}
//adds a new task before another chosen task
private void AddBefore(Task Next, Task newTask)
{
newTask.prev = Next.prev;
newTask.next = Next;
Next.prev.next = newTask;
Next.prev = newTask;
}
//adds at the start of the list
private void Prepend(Task newTask)
{
first.prev = newTask;
newTask.next = first;
first = newTask;
}
//adds at the end of the list
private void Append(Task newTask)
{
last.next = newTask;
newTask.prev = last;
last = newTask;
}
And now I need to return a list of BusinessTasks
//returns a list of business tasks
public ToDoList GetBusinessList()
{
ToDoList busList = new ToDoList();
Task loopTask = first;
while(loopTask != null)
{
if(loopTask is BusinessTask)
{
busList.AddSorted(loopTask);
}
loopTask = loopTask.next;
}
return busList;
}
But when I return this list the whole content of the main list synchronizes with this one and I cannot understand why.
You aren't putting copies of your tasks into your new list, you are putting references into the new list. As a result, you are changing the same objects. So when you push an item from your first list into the second list and as a result next and/or prev gets changed, you are changing both lists.
So you need to copy the item from your original list and put the new item in the second list.
while(loopTask != null)
{
if(loopTask is BusinessTask)
{
var clone = loopTask.Clone();
busList.AddSorted(clone);
}
loopTask = loopTask.next;
}
Now obviously you'll need to implement a Clone method that will copy all the properties except those that relate to the position in the list (prev and next) to a new instance of BusinessTask
Now if you actually want to have the object in both lists to be references to the same object. So that changing a property on one will change the other, then you can get clever by separating out the data part from the list node part. So you could do something like:
public class TaskBase
{
public string SomeProperty { get; set; }
}
public class Node
{
public TaskBase Data { get; private set;}
public Node Next { get; set; }
public Node Prev { get; set; }
public Node(TaskBase data)
{
Data = data;
}
public Node Clone()
{
// Now all the data part is the same object
// so changing Data.SomeProperty in one list will be
// reflected in both. But the Next and Prev properties
// are independent.
return new Node(Data);
}
}
And then your loop might look like this:
while(loopTask != null)
{
if(loopTask.Data is BusinessTask) // assuming BusinessTask derives from BaseTask
{
var clone = loopTask.Clone();
// clone contains the same BusinessTask, but it's position in the new list
// won't mess up the old list.
busList.AddSorted(clone);
}
loopTask = loopTask.next;
}
I built a class Cluster as follow:
public class Cluster
{
List<Cluster> lstChildClusters=new List<Cluster>();
public List<Cluster> LstChildClusters
{
get { return lstChildClusters; }
set { lstChildClusters = value; }
}
public classA alr;
}
My goal is to build a function that gets all the grandchildren of an object of Cluster type.Basically a father can have 0 or more sons which can have at their turn 0 or more sons.
I tried to build a recursive function but all it gives back is only one grandchild using the code down below.
Here is the function I built:
public List<classA> getLevel0Clusters(Cluster cluster,List<classA> list)
{
if (cluster.LstChildClusters.Count == 0)
{
list.Add(cluster.alr);
return (list);
}
else
{
for (int i = 0; i < lstChildClusters.Count - 1; i++)
{
return (lstChildClusters[i].getLevel0Clusters(lstChildClusters[i], list));
}
return (lstChildClusters[0].getLevel0Clusters(lstChildClusters[0], list));
}
}
I am using those instances for debugging:
Cluster father = new Cluster();
father.Alr = new Alarm("father");
Cluster son1 = new Cluster();
son1.Alr = new Alarm("son1");
Cluster son2 = new Cluster();
son2.Alr = new Alarm("son2");
Cluster grandson1 = new Cluster();
grandson1.Alr = new Alarm("grandson1");
Cluster grandson2 = new Cluster();
grandson2.Alr = new Alarm("grandson2");
father.LstChildClusters.Add(son1);
father.LstChildClusters.Add(son2);
son1.LstChildClusters.Add(grandson1);
son1.LstChildClusters.Add(grandson2);
List<classA> lst=new lst<ClassA>();
lst=father.getLevel0Clusters(father, father.LstAlarms);
Does anybody has any clue on how to troubleshoot this problem?
Thank you in advance
There are a number of problems with your existing code, so I've done a bit of refactoring to make your program simpler.
But first, to answer your direct question, the problem with your existing method is that you're calling return before you finish aggregating all of the results. Your code looks at grandfather and sees that it has children so it enters the for loop and recursively calls itself for son1. It sees that son1 has children so enters the for loop and recursively calls itself for grandson1 which doesn't have children so it adds grandson1 to the list and then returns. The outer call returns after finding the first value so the next two levels up just return. Hence the list only has grandson1.
So, to refactor your code: The getLevel0Clusters method does not need to pass in a Cluster (as it is defined in the Cluster class it can use this) and a List<classA> (as it can generate one as needed).
So your getLevel0Clusters can become simply this:
public List<classA> getLevel0Clusters()
{
return new[] { this.alr, }
.Concat(this.LstChildClusters
.SelectMany(child => child.getLevel0Clusters()))
.ToList();
}
In order to get everything to compile I modified your sample code to be this:
Cluster father = new Cluster();
father.alr = new classA("father");
Cluster son1 = new Cluster();
son1.alr = new classA("son1");
Cluster son2 = new Cluster();
son2.alr = new classA("son2");
Cluster grandson1 = new Cluster();
grandson1.alr = new classA("grandson1");
Cluster grandson2 = new Cluster();
grandson2.alr = new classA("grandson2");
father.LstChildClusters.Add(son1);
father.LstChildClusters.Add(son2);
son1.LstChildClusters.Add(grandson1);
son1.LstChildClusters.Add(grandson2);
List<classA> lst = father.getLevel0Clusters();
...and your classes as this:
public class Cluster
{
List<Cluster> lstChildClusters = new List<Cluster>();
public List<Cluster> LstChildClusters
{
get { return lstChildClusters; }
set { lstChildClusters = value; }
}
public classA alr;
public List<classA> getLevel0Clusters()
{
return new[] { this.alr, }
.Concat(this.LstChildClusters
.SelectMany(child => child.getLevel0Clusters()))
.ToList();
}
}
public class classA
{
public string Name;
public classA(string name)
{
this.Name = name;
}
}
When I ran your sample code I got this result out:
The problem is that as soon as you find one offspring, you return to the calling program. The value of Count has no effect other than 0 versus positive: you enter the loop, call lstChildClusters[0].getLevel0Clusters(lstChildClusters[0], and return that value without bothering to increment i and continue the loop.
Instead, your for loop has to add each return value to the list. After the loop is done, you can return to the calling program.
I am fairly new to programming in C# and am currently attempting to write the generic classes Graph and GraphNode which I have included below. I understand the logistics behind the methods IsAdjacent and GetNodeByID however I am not to sure how to code these correctly in C# so I have included a small bit of psuedo code in these methods. This however is not the case with the AddEdge method. If possible could you provide me with a solution to these three methods.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Graph
{
public class GraphNode<T>
{
private T id; //data stored in graph
private LinkedList<T> adjList; //adjacency list
//constructor
public GraphNode(T id)
{
this.id = id;
adjList = new LinkedList<T>();
}
//add an edge from this node : add to to the adjacency list
public void AddEdge(GraphNode<T> to)
{
adjList.AddFirst(to.ID);
}
//set and get for ID – data stored in graph
public T ID
{
set { id = value; }
get { return id; }
}
//returns adjacency list – useful for traversal methods
public LinkedList<T> GetAdjList()
{
return adjList;
}
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Graph
{
public class Graph<T> where T : IComparable
{
//list of GraphNodes in this graph
private LinkedList<GraphNode<T>> nodes;
//constructor - set nodes to new empty list
public Graph()
{
nodes = new LinkedList<GraphNode<T>>();
}
//only return true if the graph’s list of nodes is empty
public bool IsEmptyGraph()
{
return nodes.Count == 0;
}
//Search through list of nodes for node
//Node will be a new graphnode with the
// containing the ID to be search for
public bool ContainsGraph(GraphNode<T> node)
{
//search based on ID
foreach (GraphNode<T> n in nodes)
{
if (n.ID.CompareTo(node.ID) == 0)
return true;
}
return false;
}
//find from in list of nodes and search its adjList for to
public bool IsAdjacent(GraphNode<T> from, GraphNode<T> to)
{
foreach(GraphNode<T> n in nodes)
{
if (n.ID same as from.ID)
{ if (from.AdjList contains to.ID)
return true;
}
return false;
}
}
//add a new graphNode to list of nodes
public void AddNode(T id)
{
GraphNode<T> n = new GraphNode<T>(id);
nodes.AddFirst(n);
}
//Search through list of nodes for node with this ID
public GraphNode<T> GetNodeByID(T id)
{
foreach( GraphNode<T> n in nodes )
{
if (id = n.ID)
{
return n;
}
}
return null;
}
//find from in list of nodes (look at other methods)
//and call graphNode method to add an edge to to
//think about validation here
public void AddEdge(T from, T to)
{
}
//perform a DFS traversal starting at startID, leaving a list
//of visitied ID’s in the visited list.
}
}
Many Thanks
A couple of notes:
A few of your methods take the node "ID" rather than the node itself. Wouldn't it be easier just to use the node?
Any good reason for using LinkedList rather than List for most of these items? There's an SO discussion about this here and it's not obvious what LinkedList brings to your implementation.
With adjacency lists, your AddEdge function needs to take two inputs: your source node, and destination node, and add them to each others' adjacency lists. You already have a function AddEdge in your Node class which adds a vertex to its adjacency list. So, your code will look something like this:
public void AddEdge(GraphNode source, GraphNode destination)
{
source.AddEdge(destination);
destination.AddEdge(source);
}
For isAdjacent, I'm not clear on why you need to search the entire list of nodes. You just need to check that one node is in the others' adjacency list (which should imply vice versa assuming it's coded correctly):
public bool isAdjacent(GraphNode source, GraphNode destination)
{
if (source.AdjList.Contains(destination))
{
return true;
}
return false;
}
I haven't answered your question about GetNodeByID because of my above note - I'm not sure why it's done by ID rather than by the node itself. However, I don't see a problem with your method if you really want to do it with IDs (although it should be if (id = n.ID) rather than if (id = n.ID)).
I have the following class which recurs on itself to form a tree-like data structure:
public class chartObject
{
public string name { get; set; }
public int descendants { get; set; }
public List<chartObject> children { get; set; }
}
For each object in the tree I would like to populate the descendant property with the amount objects that exist underneath it.
Example structure:
chartObject1 (descendants: 4)
└-chartObject2 (descendants: 0)
└-chartObject3 (descendants: 2)
└--chartObject4 (descendants: 1)
└---chartObject5 (descendants: 0)
What would be the most efficient way of doing this?
How about the recursive formula:
children.Count + children.Sum(c => c.descendants)
This is suitable for eager-evaluation / caching if the tree is immutable (which it isn't from the class declaration). If you want efficiency even in the face of mutability, you'll find this a lot more difficult; you can consider marking parts of the tree "dirty" as it is mutated / eagerly force the re-evalutation of this metric to "bubble up" as part of a tree is mutated.
This works for me:
public void SetDescendants(chartObject current)
{
foreach (var child in current.children)
{
SetDescendants(child);
}
current.descendants = current.children.Sum(x => 1 + x.descendants);
}
I tested with this code:
var co = new chartObject()
{
name = "chartObject1",
children = new List<chartObject>()
{
new chartObject()
{
name = "chartObject2",
children = new List<chartObject>() { }
},
new chartObject()
{
name = "chartObject3",
children = new List<chartObject>()
{
new chartObject()
{
name = "chartObject4",
children = new List<chartObject>()
{
new chartObject()
{
name = "chartObject5",
children = new List<chartObject>() { }
}
}
}
}
}
}
};
And got this as the result:
For calculations to be most efficient, cache their result in the node itself. Otherwise, you'll be re-calculating the count every time the descendants property is looked up.
The cost of doing that is the need to invalidate the cache all the way up the parent chain, like this:
public class chartObject
{
private chartObject _parent;
private int? _descCache = null;
public string name { get; set; }
public int descendants {
get {
return _descCache ?? calcDescendents();
}
}
public List<chartObject> children { get; set; }
public void AddChild(chartObject child) {
child._parent = this;
children.Add(child);
chartObject tmp = this;
while (tmp != null) {
tmp._descCache = null;
tmp = tmp._parent;
}
}
private int calcDescendents() {
return children.Count+children.Sum(child => child.descendants);
}
}
Walk all nodes of the tree (depth first is ok) and when done with children set "descendants property to sum of children's descendants + child count. You have to do it on every change to the tree structure. You should be able to limit updates only to parents of element that is changed.
If nodes made immutable you can populate the field at creation time.
Side notes:
Your tree is mutable as it is now (one can easily add more child nodes anywhere), so it may be safer to have method that counts descendants instead of property on a node.
Having computed property int descendants { get; set; } to be read/write is confusing as anyone can set its value to whatever number. Consider if making it read only and updating when one of child nodes changes (requires some custom notification mechanism).
Code style - consider naming classes with upper case names for code that is intended to be public (follow Microsoft's C# coding guidelines). chartObject -> ChartObject