Creating a hierarchical list from flat LINQ results - c#

Hoping someone can comment on a more effective way to do this: I have a generic list of work items that represents a user's to-do list. Currently, this is sorted and displayed in an Obout Treeview by due date only and works fine.
Now we're looking to update this by allowing users to sort their to-do list by applying a primary and secondary "filter" (i.e., by due date and then by received date or similar) to that treeview, such that the treeview displays the primary sort as the parent and the secondary sort as children. The actual items would be displayed as grandchildren, like so:
Due Date
- Received Date
-- Work Item
-- Work Item
- Received Date
-- Work Item
Due Date
... etc
Obout Treeview has some crucial restrictions, as far as I can tell:
Parent nodes must be created before their children
Nodes cannot be deleted once created
There is no method to see if other nodes (parent, sibling, child) exist, so you can't programmatically tell if a node would be a duplicate on the server side.
I'm modifying some old code, so be gentle with my example. I had to take out a lot to clarify what it's doing.
public void generateOboutTreeContent()
{
// Add unique root nodes.
switch (primarySort)
{
[...]
case SortOption.ByDueDate:
addNodesForDueDates(true);
break;
[...]
}
// Then add child nodes for each root node.
switch (secondarySort)
{
[...]
case SortOption.ByReceivedDate:
addNodesForReceivedDates();
break;
[...]
}
// Finally, add all the actual items as grandchildren.
foreach (WorkItem item in WorkQueue)
{
tree.Add(parentID, item.ID, item.url, false);
}
}
private void addNodesForDueDates(bool isRootNode = false)
{
var uniqueNodes = workQueue.GroupBy(i => i.DueDate).Select(group => group.First()).ToList();
foreach (classWorkItem node in uniqueNodes)
{
var dueDate = node.DueDate;
if (isRootNode)
{
tree.Add("root", dueDate, dueDate , false);
}
else
{
tree.Add(parentID, dueDate, dueDate, false);
}
}
}
How can I more effectively create the root-first hierarchy for the Obout tree from the generic list, with minimal traversing over the dataset again and again for unique values?
Creating the structure with hard-coded sorts is messy enough, but attempting to code this in a way that cleanly allows for user-defined sorts (without an explosion of subclasses or methods) has really got me stumped. I would love to hear any suggestions at all!
Thank you.

You're not sorting the data by those dates, you're grouping the data by those data (and then sorting those groups).
To group items based on a field, simply use GroupBy.
You just need to group your items by the first field, the group each of those groups on the second field, and add in the ordering clauses as appropriate to order the groups themselves.
var query = from item in data
group item by item.DueDate into dueDateGroup
orderby dueDateGroup.Key
select from item in dueDateGroup
group item by item.RecievedDate into recievedDateGroup
orderby recievedDateGroup.Key
select recievedDateGroup;
Or, if you prefer to use method syntax:
var query2 = data.GroupBy(item => item.DueDate)
.OrderBy(group => group.Key)
.Select(dueDateGroup =>
dueDateGroup.GroupBy(item => item.RecievedDate)
.OrderBy(group => group.Key));
Once you've transformed the data into the appropriate model, translating that model into a TreeView should be straightforward, you simply need to iterate each group and create an item for that group, then iterate the items in that group adding child nodes for each item, and do the same for those children (which are themselves groups) to add the grandchildren.

Related

Performance between check exists before add to list and distinct in linq

In the foreach loop, I want to add the Products to a List, but I want this List to not contain duplicate Products, currently I have two ideas solved.
1/ In the loop, before adding the Product to the List, I will check whether the Product already exists in the List, otherwise I will add it to the List.
foreach (var product in products)
{
// code logic
if(!listProduct.Any(x => x.Id == product.Id))
{
listProduct.Add(product);
}
}
2/. In the loop, I will add all the Products to the List even if there are duplicate products. Then outside of the loop, I would use Distinct to remove duplicate records.
foreach (var product in products)
{
// code logic
listProduct.Add(product);
}
listProduct = listProduct.Distinct().ToList();
I wonder in these two ways is the most effective way. Or have any other ideas to be able to add records to the List to avoid duplication ??
I'd go for a third approach: the HashSet. It has a constructor overload that accepts an IEnumerable. This constructor removes duplicates:
If the input collection contains duplicates, the set will contain one
of each unique element. No exception will be thrown.
Source: HashSet<T> Constructor
usage:
List<Product> myProducts = ...;
var setOfProducts = new HashSet<Product>(myProducts);
After removing duplicates there is no proper meaning of setOfProducts[4].
Therefore a HashSet is not a IList<Product>, but an ICollection<Product>, you can Count / Add / Remove, etc, everything you can do with a List. The only thing you can't do is fetch by index
You first take which elements are not already in the collection:
var newProducts = products.Where(x => !listProduct.Any(y => x.Id == y.Id));
And then just add them using AddRang
listProduct.AddRagne(newItems)
Or you can use foreach loop too
foreach (var product in newProducts)
{
listProduct.Add(product);
}
1 more easy solution could be there no need to use Distint
var newProductList = products.Union(listProduct).ToList();
But Union has not good performance.
From what you have included, you are storing everything in memory. If this is the case, or you are persisting only after you have it ready you can consider using BinarySearch:
https://msdn.microsoft.com/en-us/library/w4e7fxsh(v=vs.110).aspx and you also get an ordered list at the end. If ordering is not important, you can use HashSet, which is very fast, and meant specially for this purpose.
Check also: https://www.dotnetperls.com/hashset
This should be pretty fast and take care of any ordering:
// build a HashSet of your primary keys type (I'm assuming integers here) containing all your list elements' keys
var hashSet = new HashSet<int>(listProduct.Select(p => p.Id));
// add all items from the products list whose Id can be added to the hashSet (so it's not a duplicate)
listProduct.AddRange(products.Where(p => hashSet.Add(p.Id)));
What you might want to consider doing instead, though, is implementing IEquatable<Product> and overriding GetHashCode() on your Product type which would make the above code a little easier and put the equality checks where they should be (inside the respective type):
var hashSet = new HashSet<int>(listProduct);
listProduct.AddRange(products.Where(hashSet.Add));

Filtering a list of HtmlElements based on a list of partial ids

I've got an HtmlElementCollection and I want to use Linq to get a list of HtmlElements whose ids contain an id from another list.
So I've tried a couple of things none of which worked out. I get a list from the collection and try to filter it.
This is the list of partial ids. The element ids are distinct, and they have ids that correspond to this list plus some random-seeming numbers at the beginning.
string[] ids = {"btadminh_struct.description",
"thtmlb_textView_6",
"thtmlb_textView_7",
"btadminh_struct.object_id",
"thtmlb_textView_12",
"zbtsalesset_struct.po_number_sold",
"thtmlb_textView_17",
"thtmlb_textView_21",
"thtmlb_textView_24",
"btcustomerh_z_followupdate",
"thtmlb_textView_29",
"btrefobjmain_ibibase",
"btrefobjmain_ibinstancedesc",
"btpartnerserviceto_struct.description_name",
"btpartnerset_contact_name",
"zzericempresp_struct.partner_no",
"zbtcsrowner_struct.partner_no",
"btcustomerh_struct.zcomments",
"thtmlb_textView_19",
"btadminh_servicecontractdescr",
"btcustomerh_zcontracttype_descr",
"btrefobjmain_network_id",
"btrefobjmain_node_id",
"btrefobjmain_site_id"};
Element ids looks like this:
"C29_W87_V88_btrefobjmain_network_instance",
"C29_W87_V88_btrefobjmain_network_id__items",
"C29_W87_V88_btrefobjmain_network_id",
"C29_W87_V88_btrefobjmain_network_id-btn",
"C29_W87_V88_btrefobjmain_network_id__key",
"C29_W87_V88_thtmlb_label_2",
"C29_W87_V88_btrefobjmain_service_id__items",
"C29_W87_V88_btrefobjmain_service_id",
"C29_W87_V88_btrefobjmain_service_id-btn",
"C29_W87_V88_btrefobjmain_service_id__key",
"C29_W87_V88_thtmlb_label_3",
"C29_W87_V88_btrefobjmain_networkadap_id__items",
"C29_W87_V88_btrefobjmain_networkadap_id",
"C29_W87_V88_btrefobjmain_networkadap_id-btn",
"C29_W87_V88_btrefobjmain_networkadap_id__key",
So I've put my collection into a List that I can query.
var elems = doc.All.Cast<HtmlElement>();
I've tried different approaches, none of which are quite working. I'd also like to use Linq and avoid an ugly 2-D foreach loop.
Any ideas?
so something like elems.Where(x => ids.Any(id => x.ID.Contains(id)))
What this is doing is going through each item in elems (your html element list) and then going through each id in your id collection and if any match, then it will return that element.

SortedHashTable in c#

What I am trying to do is to implement a heuristic approach to NP complete problem: I have a list of objects (matches) each has a double score. I am taking the first element in the list sorted by the score desc and then remove it from the list. Then all elements bound to the first one are to be removed. I iterate through the list till I have no more elements.
I need a data structure which can efficiently solve this problem, so basically it should ahve the following properties:
1. Generic
2. Is always sorted
3. Has a fast key access
Right now SortedSet<T> looks like the best fit.
The question is: is it the most optimal choice for in my case?
List result = new List();
while (sortedItems.Any())
{
var first = sortedItems.First();
result.Add(first);
sortedItems.Remove(first);
foreach (var dependentFirst in first.DependentElements)
{
sortedItems.Remove(dependentFirst);
}
}
What I need is something like sorted hash table.
I assume you're not just wanting to clear the list, but you want to do something with each item as it's removed.
var toDelete = new HashSet<T>();
foreach (var item in sortedItems)
{
if (!toDelete.Contains(item))
{
toDelete.Add(item);
// do something with item here
}
foreach (var dependentFirst in item.DependentElements)
{
if (!toDelete.Contains(item))
{
toDelete.Add(dependentFirst);
// do something with item here
}
}
}
sortedItems.RemoveAll(i => toDelete.Contains(i));
I think you should use two data structures - a heap and a set - heap for keeping the sorted items, set for keeping the removed items. Fill the heap with the items, then remove the top one, and add it and all its dependents to the set. Remove the second one - if it's already in the set, ignore it and move to the third, otherwise add it and its dependents to the set.
Each time you add an item to the set, also do whatever it is you plan to do with the items.
The complexity here is O(NlogN), you won't get any better than this, as you have to sort the list of items anyway. If you want to get better performance, you can add a 'Removed' boolean to each item, and set it to true instead of using a set to keep track of the removed items. I don't know if this is applicable to you.
If im not mistake, you want something like this
var dictionary = new Dictionary<string, int>();
dictionary.Add("car", 2);
dictionary.Add("apple", 1);
dictionary.Add("zebra", 0);
dictionary.Add("mouse", 5);
dictionary.Add("year", 3);
dictionary = dictionary.OrderBy(o => o.Key).ToDictionary(o => o.Key, o => o.Value);

Build Tree more efficiently?

I was wondering if this code is good enough or if there are glaring newbie no-no's.
Basically I'm populating a TreeView listing all Departments in my database. Here is the Entity Framework model:
Here is the code in question:
private void button1_Click(object sender, EventArgs e)
{
DepartmentRepository repo = new DepartmentRepository();
var parentDepartments = repo.FindAllDepartments()
.Where(d => d.IDParentDepartment == null)
.ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = repo.FindAllDepartments()
.Where(x => x.IDParentDepartment == parent.ID)
.ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
}
EDIT:
Good suggestions so far. Working with the entire collection makes sense I guess. But what happens if the collection is huge as in 200,000 entries? Wouldn't this break my software?
DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();
var parentDepartments = entries
.Where(d => d.IDParentDepartment == null)
.ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID)
.ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
Since you are getting all of the departments anyway, why don't you do it in one query where you get all of the departments and then execute queries against the in-memory collection instead of the database. That would be much more efficient.
In a more general sense, any database model that is recursive can lead to issues, especially if this could end up being a fairly deep structure. One possible thing to consider would be for each department to store all of its ancestors so that you would be able to get them all at once instead of having to query for them all at once.
In light of your edit, you might want to consider an alternative database schema that scales to handle very large tree structures.
There's a explanation on the fogbugz blog on how they handle hierarchies. They also link to this article by Joe Celko for more information.
Turns out there's a pretty cool solution for this problem explained by Joe Celko. Instead of attempting to maintain a bunch of parent/child relationships all over your database -- which would necessitate recursive SQL queries to find all the descendents of a node -- we mark each case with a "left" and "right" value calculated by traversing the tree depth-first and counting as we go. A node's "left" value is set whenever it is first seen during traversal, and the "right" value is set when walking back up the tree away from the node. A picture probably makes more sense:
The Nested Set SQL model lets us add case hierarchies without sacrificing performance.
How does this help? Now we just ask for all the cases with a "left" value between 2 and 9 to find all of the descendents of B in one fast, indexed query. Ancestors of G are found by asking for nodes with "left" less than 6 (G's own "left") and "right" greater than 6. Works in all databases. Greatly increases performance -- particularly when querying large hierarchies.
Assuming that you are getting the data from a database the first thing that comes to mind is that you are going to be hitting the database n+1 times for as many parents that you have in the database. You should try and get the whole tree structure out in one hit.
Secondly, you seem to get the idea patterns seeing as you appear to be using the repository pattern so you might want to look at IoC. It allows you to inject your dependency on a particular object such as your repository into your class where it is going to be used allowing for easier unit testing.
Thirdly, regardless of where you get your data from, move the structuring of the data into a tree data structure into a service which returns you an object containing all your departments that have already been organised (This basically becomes a DTO). This will help you reduce code duplication.
With anything you need to apply the yagni principle. This basically says that you should only do something if you are going to need it so if the code you have provided above is complete, needs no further work and is functional don't touch it. The same goes with the performance issue of select n+1, if you are not seeing any performance hits don't do anything as it may be premature optimization.
In your edit
DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();
var parentDepartments = entries.Where(d => d.IDParentDepartment == null).ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID).ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
You still have a n+1 issue. This is because the data is only retrieved from the database when you call the ToList() or when you iterate over the enumeration. This would be better.
var entries = repo.FindAllDepartments().ToList();
var parentDepartments = entries.Where(d => d.IDParentDepartment == null);
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID);
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
That looks ok to me, but think about a collection of hundreds of thousands nodes. The best way to do that is asynchronous loading - please notice, that do don't necassarily have to load all elements at the same time. Your tree view can be collapsed by default and you can load additional levels as the user expands tree's nodes. Let's consider such case: you have a root node containing 100 nodes and each of these nodes contains at least 1000 nodes. 100 * 1000 = 100000 nodes to load - pretty much, istn't it? To reduce the database traffic you can first load your first 100 nodes and then, when user expands one of those, you can load its 1000 nodes. That will save considerable amount of time.
Things that come to mind:
It looks like .ToList() is needless. If you are simply iterating over the returned result, why bother with the extra step?
Move this function into its own thing and out of the event handler.
As other have said, you could get the whole result in one call. Sort by IDParentDepartment so that the null ones are first. That way, you should be able to get the list of departments in one call and iterate over it only once adding children departments to already existing parent ones.
Wrap the TreeView modifications with:
treeView.BeginUpdate();
// modify the tree here.
treeView.EndUpdate();
To get better performance.
Pointed out here by jgauffin
This should use only one (albeit possibly large) call to the database:
Departments.Join(
Departments,
x => x.IDParentDepartment,
x => x.Name,
(o,i) => new { Child = o, Parent = i }
).GroupBy(x => x.Parent)
.Map(x => {
var node = new TreeNode(x.Key.Name);
x.Map(y => node.Nodes.Add(y.Child.Name));
treeView1.Nodes.Add(node);
}
)
Where 'Map' is just a 'ForEach' for IEnumerables:
public static void Map<T>(this IEnumerable<T> source, Action<T> func)
{
foreach (T i in source)
func(i);
}
Note: This will still not help if the Departments table is huge as 'Map' materializes the result of the sql statement much like 'ToList()' does. You might consider Piotr's answer.
In addition to Bronumski and Keith Rousseau answer
Also add the DepartmentID with the nodes(Tag) so that you don't have to re-query the database to get the departmentID

LINQ to Entities Question - All objects where all items in subcollection appear in another collection?

Hopefully I can explain this to where it make sense, but I'm trying to get a list of objects out of a master list using a speicific and complex (complex to me, at least) set of criteria.
I have a Class called TableInfo that exposes a List of ForeignKeyInfo. ForeignKeyInfo has a string property (among others) called, Table. I need to do some sequential processing using my TableInfo objects but only work with the TableInfo objects I haven't yet processed. To keep track of which TableInfo objects have already been processed I have a List which stores the name of the table after the processing has been complete.
I want to loop until all of the items in my TableInfo collection appear in my processed list. For each iteration of the loop, I should be processing all of the TableInfo items where all of the ForeignKeyInfo.Table strings appear in my processed List.
Here's how I've written it in "standard" looping code:
while(processed.Count != _tables.Count)
{
List<TableInfo> thisIteration = new List<TableInfo>();
foreach (TableInfo tab in _tables)
{
bool allFound = true;
foreach (ForeignKeyInfo fk in tab.ForeignKeys)
{
allFound = allFound && processed.Contains(fk.Table);
}
if (allFound && !processed.Contains(tab.Name))
{
thisIteration.Add(tab);
}
}
//now do processing using thisIteration list
//variable, "thisIteration", is what I'd like to replace with the result from LINQ
}
This should do it:
var thisIteration = _tables.Where(t => !processed.Contains(t.Name)
&& t.ForeignKeys
.All(fk => processed.Contains(fk.Table));
I'm assuming you just need to iterate over the thisIteration collection, in which case leaving it as an IEnumerable is fine. If you need it to be a list, you can just put in a .ToList() call at the end.
I'm not really sure what you're trying to do here. However, you can convert the body of your loop into the following LINQ query, if that makes things simpler...
List<TableInfo> thisIteration = (from tab in _tables
let allFound = tab.ForeignKeys.Aggregate(true, (current, fk) => current && processed.Contains(fk.Table))
where allFound && !processed.Contains(tab.Name)
select tab).ToList();

Categories

Resources