Currently I have a structure that looks like:
Name,Type, RefName, RefType
The goal is that the user selects an element and a direction and then get a treestructure in a webpage (JSON data) that displays all elements matching his criteria. As you can imagine, going up is not hard. The tree structure gets narrower and thus performance is no issue. While going down more and more nodes get added and at a certain point it gets slow. At this moment the nodes are added to a HashSet and I go recursively trough them. My program structure for going down is:
private void BuildChildNodes(ElementRefItem element, int goalDepth, int currentDepth =0)
{
if (goalDepth <= currentDepth)
{
return;
}
currentDepth++;
var elements = refElementRepository.All().Where(x => x.ElementName == element.Name);
foreach (var refElement in elements)
{
var node = CreateNode(refElement.ElementRefName, refElement.ElementRefType);
BuildChildNodes(node, goalDepth, currentDepth);
element.ChildNodes.Add(node);
}
}
I am looking for tips to optimize this part of the code. Are there other ways of doing this faster?
(Actual data structure is a bit more complex, but for clarity it is easier to show this version)
Instead of performing a query every time I retrieve the data once (all data is refreshed once a day) and I build two HashSets; one for parent-child relations and one for child-parent relations. This makes it possible to find an element in the HashTable (witch is very fast) and therefor retrieving the whole tree part. In meta code it looks like this:
- Create new cache (bottom up, top down)
- Foreach CachedObject
- Check if items exist (both parent and child)
: yes -> select node
: no -> create node and insert it in the new cache
- add child to parent node
This gives a hashset of Parent objects. Now you can select a parent object and automagically you get your whole tree section.
So, I have a tree data structure which should work fine but now I want to print the results. My problem is if the parent node has two children (right and left) which aren't null; what path should it take? If it takes lets say right then all of the left ones after that node get left out?
I want the tree to be in the correct order
Example of what it'll do if I do not order it correctly:
-----------------------------root---------------------------------------
-------------------value---------------skips this value-----------------
-------------value-------skips this value-- skips this value -- skips this value
----------value -------skips this value----- skips this value ---- skips this value
What I'd like it to do:
-----------------------------------root----------------------------------
---------------------------value-----------value-------------------------
----------------------value----value-----value---value-------------------
--------------------value--value--value-value-value-value-value
and so on(I'd continue on but it gets cluttered)
I'll need to place my nodes into a stack and order them in advance. Suggestions?
Let's say I have a list of branches, how to find a list of tree list that are singly connected together? The below diagram should illustrate my point. The input is a list of branches in a single tree, as labeled, ie. 1, 2, 3 and so on
The process is as follows:
Create a function that accepts a tree node as a parameter
If the node has no children, print the value of the current node and return.
If the node has two children, end the current list of single node values, then recurse into the left node, then the right node
If the node has one child, add it to the list of values contained in the tree, then recurse into that node.
Continue.
According to image, there is a really simple solution.
Let's make a list, with elements, that are lists of the same type. Procedure will be called tree_lists(list, tree). All you need to do is:
Looking at current joint, you
have your list pointer on the first
element of list.
If there are
more than one child in current node: iterate through each
subtree, incrementing list pointer
and calling
tree_lists(list[i],current_subtree)
where i is list pointer and
current_subtree is current subtree
=)
If ony one child exists, just add this joint to the current list item and move on to next.
Of course, list pointer and list values must be somehow global and modified in recurion as well.
What is the best method to loop through TreeView nodes and retrieve a node based on certain value?
It would more efficient to create something like a Dictionary<string, TreeNode> and add all the nodes in it. This of course must be done at the start of the form or whenever you add new tree nodes. The dictionary key can be anything e.g TreeNode Text or business object associated with the node.
You won't need to traverse through all the nodes - just use the search criteria (key) and retrieve the node.
Suppose I have a collection (be it an array, generic List, or whatever is the fastest solution to this problem) of a certain class, let's call it ClassFoo:
class ClassFoo
{
public string word;
public float score;
//... etc ...
}
Assume there's going to be like 50.000 items in the collection, all in memory.
Now I want to obtain as fast as possible all the instances in the collection that obey a condition on its bar member, for example like this:
List<ClassFoo> result = new List<ClassFoo>();
foreach (ClassFoo cf in collection)
{
if (cf.word.StartsWith(query) || cf.word.EndsWith(query))
result.Add(cf);
}
How do I get the results as fast as possible? Should I consider some advanced indexing techniques and datastructures?
The application domain for this problem is an autocompleter, that gets a query and gives a collection of suggestions as a result. Assume that the condition doesn't get any more complex than this. Assume also that there's going to be a lot of searches.
With the constraint that the condition clause can be "anything", then you're limited to scanning the entire list and applying the condition.
If there are limitations on the condition clause, then you can look at organizing the data to more efficiently handle the queries.
For example, the code sample with the "byFirstLetter" dictionary doesn't help at all with an "endsWith" query.
So, it really comes down to what queries you want to do against that data.
In Databases, this problem is the burden of the "query optimizer". In a typical database, if you have a database with no indexes, obviously every query is going to be a table scan. As you add indexes to the table, the optimizer can use that data to make more sophisticated query plans to better get to the data. That's essentially the problem you're describing.
Once you have a more concrete subset of the types of queries then you can make a better decision as to what structure is best. Also, you need to consider the amount of data. If you have a list of 10 elements each less than 100 byte, a scan of everything may well be the fastest thing you can do since you have such a small amount of data. Obviously that doesn't scale to a 1M elements, but even clever access techniques carry a cost in setup, maintenance (like index maintenance), and memory.
EDIT, based on the comment
If it's an auto completer, if the data is static, then sort it and use a binary search. You're really not going to get faster than that.
If the data is dynamic, then store it in a balanced tree, and search that. That's effectively a binary search, and it lets you keep add the data randomly.
Anything else is some specialization on these concepts.
var Answers = myList.Where(item => item.bar.StartsWith(query) || item.bar.EndsWith(query));
that's the easiest in my opinion, should execute rather quickly.
Not sure I understand... All you can really do is optimize the rule, that's the part that needs to be fastest. You can't speed up the loop without just throwing more hardware at it.
You could parallelize if you have multiple cores or machines.
I'm not up on my Java right now, but I would think about the following things.
How you are creating your list? Perhaps you can create it already ordered in a way which cuts down on comparison time.
If you are just doing a straight loop through your collection, you won't see much difference between storing it as an array or as a linked list.
For storing the results, depending on how you are collecting them, the structure could make a difference (but assuming Java's generic structures are smart, it won't). As I said, I'm not up on my Java, but I assume that the generic linked list would keep a tail pointer. In this case, it wouldn't really make a difference. Someone with more knowledge of the underlying array vs linked list implementation and how it ends up looking in the byte code could probably tell you whether appending to a linked list with a tail pointer or inserting into an array is faster (my guess would be the array). On the other hand, you would need to know the size of your result set or sacrifice some storage space and make it as big as the whole collection you are iterating through if you wanted to use an array.
Optimizing your comparison query by figuring out which comparison is most likely to be true and doing that one first could also help. ie: If in general 10% of the time a member of the collection starts with your query, and 30% of the time a member ends with the query, you would want to do the end comparison first.
For your particular example, sorting the collection would help as you could binarychop to the first item that starts with query and terminate early when you reach the next one that doesn't; you could also produce a table of pointers to collection items sorted by the reverse of each string for the second clause.
In general, if you know the structure of the query in advance, you can sort your collection (or build several sorted indexes for your collection if there are multiple clauses) appropriately; if you do not, you will not be able to do better than linear search.
If it's something where you populate the list once and then do many lookups (thousands or more) then you could create some kind of lookup dictionary that maps starts with/ends with values to their actual values. That would be a fast lookup, but would use much more memory. If you aren't doing that many lookups or know you're going to be repopulating the list at least semi-frequently I'd go with the LINQ query that CQ suggested.
You can create some sort of index and it might get faster.
We can build a index like this:
Dictionary<char, List<ClassFoo>> indexByFirstLetter;
foreach (var cf in collection) {
indexByFirstLetter[cf.bar[0]] = indexByFirstLetter[cf.bar[0]] ?? new List<ClassFoo>();
indexByFirstLetter[cf.bar[0]].Add(cf);
indexByFirstLetter[cf.bar[cf.bar.length - 1]] = indexByFirstLetter[cf.bar[cf.bar.Length - 1]] ?? new List<ClassFoo>();
indexByFirstLetter[cf.bar[cf.bar.Length - 1]].Add(cf);
}
Then use the it like this:
foreach (ClasssFoo cf in indexByFirstLetter[query[0]]) {
if (cf.bar.StartsWith(query) || cf.bar.EndsWith(query))
result.Add(cf);
}
Now we possibly do not have to loop through as many ClassFoo as in your example, but then again we have to keep the index up to date. There is no guarantee that it is faster, but it is definately more complicated.
Depends. Are all your objects always going to be loaded in memory? Do you have a finite limit of objects that may be loaded? Will your queries have to consider objects that haven't been loaded yet?
If the collection will get large, I would definitely use an index.
In fact, if the collection can grow to an arbitrary size and you're not sure that you will be able to fit it all in memory, I'd look into an ORM, an in-memory database, or another embedded database. XPO from DevExpress for ORM or SQLite.Net for in-memory database comes to mind.
If you don't want to go this far, make a simple index consisting of the "bar" member references mapping to class references.
If the set of possible criteria is fixed and small, you can assign a bitmask to each element in the list. The size of the bitmask is the size of the set of the criteria. When you create an element/add it to the list, you check which criteria it satisfies and then set the corresponding bits in the bitmask of this element. Matching the elements from the list will be as easy as matching their bitmasks with the target bitmask. A more general method is the Bloom filter.