Thanks again for all the wonderful answers you have all posted!
I have two tables in SQL. The first defines the parent, and has a primary key column called ParentId. I also have a child table that has a primary key, and a foreign key as 'ParentId'. So the two tables form a one parent - to many children relationship.
The question is what is the most efficient way to pull the parent + child data C# code? The data has to be read into the following objects:
public class Parent
{
public int ParentId { get; set; }
public List<Child> Children { get; set; }
// ... many more properties ... //
}
public class Child
{
public int ChildId { get; set; }
public string Description { get; set; }
// ... many more properties ... //
}
If i use the following query I will get the parent and the children at once where each parent will be repeated as many times as many children it has:
SELECT
p.ParentId as 'ParentId',
c.ChildId as 'ChildId',
-- other relevant fields --
FROM
Parents p
INNER JOIN
Children c
ON
p.ParentId = c.ParentId
Using this approach I'd have to find all the unique parent rows, and then read all the children. The advantage is that I only make 1 trip to the db.
The second version of this is to read all parents separately:
SELECT * FROM Parents
and then read all children separately:
SELECT * FROM Children
and use LINQ to merge all parents with children. This approach makes 2 trips to the db.
The third and final (also most inefficient) approach is to grab all parents, and while constructing each parent object, make a trip to the DB to grab all its children. This approach takes n+1 connections: 1 for all parents and n number of trips to get all children for each parent.
Any advise on how to do this easier? Granted i can't get away from using stored procedures, and I can't use LINQ2SQL or EF. Would you prefer Data Tables vs DataReaders and if so how to use either with approach 1 or 2?
Thanks,
Martin
I prefer pulling all results in one query and just build the tree in one loop
SELECT p.ParentId as 'ParentId', null as 'ChildId'
FROM Parents p
UNION ALL
SELECT c.ParentId as 'ParentId', c.ChildId as 'ChildId'
FROM Children c
List<Parent> result = new List<Parent>();
Parent current;
while (dr.Read())
{
if (string.isNullOrEmpty(dr['ChildId']))
{
//create and initialize your parent object here and set to current
}
else if (!string.isNullOrEmpty(dr['ChildId'])
&& dr['ParentId'].ToString().Equals(current.ParentId.ToString())
{
//create and initialize child
//add child to parents child collection
}
}
Using this approach I'd have to find
all the unique parent rows, and then
read all the children.
You could just include an order by p.ParentId. This ensures all children from the same parent are in consecutive rows. So you can read the next row, if the parent has changed, create a new parent object, otherwise add the child to the previous parent. No need to search for unique parent rows.
I usually make this decision at the table level. Some tables I need the children often, so I grab them right away. In other cases accessing the children is a rarity, so I lazy-load them.
I would guess option #2 would be more efficient bandwidth wise over option #1 (as you're not repeating any data).
You can have both queries in a single stored procedure, and execute the procedure through code using a sqldataadapter (i.e. (new SqlDataAdapter(command)).Fill(myDataSet), where myDataSet would contain the two tables).
From there you'd read the first table, creating a dictionary of the parents (in a Dictionary<int, Parent>) by ParentId, then simply read each row in the 2nd table to add the children:
parents[(int)myDataSet.Tables[1]["ParentId"]].Children.Add(new Child() { etc } );
The pseudo code is probably off a bit, but hopefully you get the general idea
Related
Hoping someone can comment on a more effective way to do this: I have a generic list of work items that represents a user's to-do list. Currently, this is sorted and displayed in an Obout Treeview by due date only and works fine.
Now we're looking to update this by allowing users to sort their to-do list by applying a primary and secondary "filter" (i.e., by due date and then by received date or similar) to that treeview, such that the treeview displays the primary sort as the parent and the secondary sort as children. The actual items would be displayed as grandchildren, like so:
Due Date
- Received Date
-- Work Item
-- Work Item
- Received Date
-- Work Item
Due Date
... etc
Obout Treeview has some crucial restrictions, as far as I can tell:
Parent nodes must be created before their children
Nodes cannot be deleted once created
There is no method to see if other nodes (parent, sibling, child) exist, so you can't programmatically tell if a node would be a duplicate on the server side.
I'm modifying some old code, so be gentle with my example. I had to take out a lot to clarify what it's doing.
public void generateOboutTreeContent()
{
// Add unique root nodes.
switch (primarySort)
{
[...]
case SortOption.ByDueDate:
addNodesForDueDates(true);
break;
[...]
}
// Then add child nodes for each root node.
switch (secondarySort)
{
[...]
case SortOption.ByReceivedDate:
addNodesForReceivedDates();
break;
[...]
}
// Finally, add all the actual items as grandchildren.
foreach (WorkItem item in WorkQueue)
{
tree.Add(parentID, item.ID, item.url, false);
}
}
private void addNodesForDueDates(bool isRootNode = false)
{
var uniqueNodes = workQueue.GroupBy(i => i.DueDate).Select(group => group.First()).ToList();
foreach (classWorkItem node in uniqueNodes)
{
var dueDate = node.DueDate;
if (isRootNode)
{
tree.Add("root", dueDate, dueDate , false);
}
else
{
tree.Add(parentID, dueDate, dueDate, false);
}
}
}
How can I more effectively create the root-first hierarchy for the Obout tree from the generic list, with minimal traversing over the dataset again and again for unique values?
Creating the structure with hard-coded sorts is messy enough, but attempting to code this in a way that cleanly allows for user-defined sorts (without an explosion of subclasses or methods) has really got me stumped. I would love to hear any suggestions at all!
Thank you.
You're not sorting the data by those dates, you're grouping the data by those data (and then sorting those groups).
To group items based on a field, simply use GroupBy.
You just need to group your items by the first field, the group each of those groups on the second field, and add in the ordering clauses as appropriate to order the groups themselves.
var query = from item in data
group item by item.DueDate into dueDateGroup
orderby dueDateGroup.Key
select from item in dueDateGroup
group item by item.RecievedDate into recievedDateGroup
orderby recievedDateGroup.Key
select recievedDateGroup;
Or, if you prefer to use method syntax:
var query2 = data.GroupBy(item => item.DueDate)
.OrderBy(group => group.Key)
.Select(dueDateGroup =>
dueDateGroup.GroupBy(item => item.RecievedDate)
.OrderBy(group => group.Key));
Once you've transformed the data into the appropriate model, translating that model into a TreeView should be straightforward, you simply need to iterate each group and create an item for that group, then iterate the items in that group adding child nodes for each item, and do the same for those children (which are themselves groups) to add the grandchildren.
I've seen a number of "Bulk Insert in EF" questions however all of these deal with a usecase where a user is trying to insert a large array of items.
I've a situation where I have a new Parent entity with ~1500 new related entities attached to it. Both the parent and the child entities are mapped to their own tables in EF.
At the moment I'm using something like:
//p is already created and contains all the new child items
public void SaveBigThing(Parent p){
if(p.Id == 0){
// we've got a new object to add
db.BigObjects.Add(p);
}
db.SaveChanges();
}
Entity Framework at the moment creates an individual insert statement for each and every child item. Which takes 50 seconds or so. I want to be able to use db.ChildEntity.AddRange(items) But I'm unsure if there's a better way than to use 2 separate operations. First create the parent to get it's Id then a AddRange for all the child items?
IMHO You dont need to add parent first in-order to insert child items. You could do that in one shot.
You could try this in EF 5 AddRange is only available in EF 6 or higher.
This will not insert the item in bulk it will generate the query and insert at one shot
Ef bulk insertion reference
Another reference
public bool InsertParent(Parent parentObject)
{
//Assuming this is the parent table
db.BigObjects.Add(parentObject);
InsertChilds(parentObject); //Insert childs
db.SaveChanges();
}
public bool InsertChilds(Parent parentObject)
{
// This will save more time .
DataContext).Configuration.AutoDetectChangesEnabled = false;
foreach(var child in parentObject.Childs)
{
//This will set the child parent relation
child.Parent = childParent;
db.ChildEntity.Add(child);
}
DataContext).Configuration.AutoDetectChangesEnabled = true;
}
I've been looking for an answer everywhere, but can't find anything. I have two tables, Media and Keywords, which have a many to many relationship. Now the Keywords table is quite simple - it has a ID, Name and ParentFK column that relates to ID column (it's a tree structure).
The user can assign any single keyword to the media file, which means that he can select a leaf without selecting the root or branch.
Now I have to be able to determine if a root keyword has any child, grandchild etc. which is assigned to a media object, but I have to do it from the root.
Any help will be appreciated.
Just look for any entry, which has the given ParentFK set with your ID.
public static bool HasChild(int id) {
return
db.Keywords.Any(item => item.Parent == id);
}
public static bool HasGrandChilds(int id) {
return
db.Keywords.Where(item => item.Parent == id).Any(item => HasChild(item.ID);
}
A more generic way:
public static bool HasGrandChilds(int id, int depth) {
var lst = new List<Keywords>();
for (var i = 0; i < depth - 1; i++) {
if (i == 0)
{
//Initial search at first loop run
lst = db.Keywords.Where(item => item.ParentId == id);
}
else
{
//Search all entries, where the parent is in our given possible parents
lst = db.Keywords.Where(item => lst.Any(k => k.Id == item.Parent));
}
if (!lst.Any())
{
//If no more children where found, the searched depth doesn't exist
return false;
}
}
return true;
}
From your current schema I can't think of a better solution than the following:
Issue a query to retrieve a list of all children of the root.
Issue queries to retrieve a list of all children of the children from the previous step.
So on, recursively to create a list of all descendants of the root.
Next query the DB for all media objects that have any of the keywords in the list.
But the above algorithm will entail multiple calls to the DB. You can make it in a single query of you refine your schema a little. I would suggest that you keep for each keyword not only its parent FK, but also its root FK. This way you could issue a single query to get all objects that have a keyword whose root FK is the desired one.
Sometimes you get one of those days no matter how much you batter your head around a wall, even the simplest task alludes you (this is one of those days!).
So what I have is a list of categories
CategoryID, CategoryName, ParentID, Lineage
1 Root Category, NULL, /1/
2 Child Category, 1, /1/2/
3 Grandchild, 2, /1/2/3
4 Second Root, NULL, /4/
5 Second Child 2, /1/2/5/
I've created a class to hold this where it contains all the values above, plus
ICollection<Category> Children;
This should create the tree
Root Category
`-- Child category
| `-- Grandchild
`-- Second Child
Second Root
So I'm trying to add a new category to the tree given the Lineage and the element, I convert the lineage to a queue and throw it into this function.
public void AddToTree(ref Category parentCategory, Category newCategory, Queue<Guid>lineage)
{
Guid lastNode = lineage.Dequeue();
if(lastNode == newCategory.CategoryId)
{
parentCategory.Children.Add(newCategory);
return;
}
foreach (var category in parentCategory.Children)
{
if(category.CategoryId == lastNode)
{
this.AddToTree(ref category, newCategory, lineage);
}
}
}
Now two problems I'm getting
The self referencing isn't too worrying (its designed to be recursive) but since the category in the foreach loop is a locally instantiated variable I can't make it by reference and use it as a pointer.
I'm sure there has to be an easier way than this!
Any pointers would be greatly received.
This code seems to be what you are looking for, but without any self references and recursions - it goes through the tree along the given lineage and in the end of the lineage inserts the given category.
Several assumptions:
Tree is stored as a list of its roots
lineage is a string
void AddCategory(List<Category> roots, Category categoryToAdd, string lineage)
{
List<Guid> categoryIdList = lineage.Split('/').Select(id => new Guid(id)).ToList();
List<Category> currentNodes = roots;
Category parentNode = null;
foreach (Guid categoryId in categoryIdList)
{
parentNode = currentNodes.Where(category => category.CategoryId == categoryId).Single();
currentNodes = parentNode.Children;
}
parentNode.Children.Add(categoryToAdd);
}
You dont appear to need the "ref" at all. You are not modifying the object reference, just its state.
EDIT:
If you must use ref, then use a temporary variable, for example...
foreach (var temp in parentCategory.Children)
{
Category category = temp;
if (category.CategoryId == lastNode)
{
this.AddToTree(ref category, newCategory, lineage);
}
}
But even with this, the ref is about useless. AddToTree does not modify the reference value. It modifies the referenced objects state. Maybe you have more code involved that we need to see.
If your intent is to modify the child reference in the parent, you will have an issue with ICollection Children object. You cannot use "ref" on an element in the ICollection to in effect replace the reference. You would have to remove the child reference and add a new one.
I was wondering if this code is good enough or if there are glaring newbie no-no's.
Basically I'm populating a TreeView listing all Departments in my database. Here is the Entity Framework model:
Here is the code in question:
private void button1_Click(object sender, EventArgs e)
{
DepartmentRepository repo = new DepartmentRepository();
var parentDepartments = repo.FindAllDepartments()
.Where(d => d.IDParentDepartment == null)
.ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = repo.FindAllDepartments()
.Where(x => x.IDParentDepartment == parent.ID)
.ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
}
EDIT:
Good suggestions so far. Working with the entire collection makes sense I guess. But what happens if the collection is huge as in 200,000 entries? Wouldn't this break my software?
DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();
var parentDepartments = entries
.Where(d => d.IDParentDepartment == null)
.ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID)
.ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
Since you are getting all of the departments anyway, why don't you do it in one query where you get all of the departments and then execute queries against the in-memory collection instead of the database. That would be much more efficient.
In a more general sense, any database model that is recursive can lead to issues, especially if this could end up being a fairly deep structure. One possible thing to consider would be for each department to store all of its ancestors so that you would be able to get them all at once instead of having to query for them all at once.
In light of your edit, you might want to consider an alternative database schema that scales to handle very large tree structures.
There's a explanation on the fogbugz blog on how they handle hierarchies. They also link to this article by Joe Celko for more information.
Turns out there's a pretty cool solution for this problem explained by Joe Celko. Instead of attempting to maintain a bunch of parent/child relationships all over your database -- which would necessitate recursive SQL queries to find all the descendents of a node -- we mark each case with a "left" and "right" value calculated by traversing the tree depth-first and counting as we go. A node's "left" value is set whenever it is first seen during traversal, and the "right" value is set when walking back up the tree away from the node. A picture probably makes more sense:
The Nested Set SQL model lets us add case hierarchies without sacrificing performance.
How does this help? Now we just ask for all the cases with a "left" value between 2 and 9 to find all of the descendents of B in one fast, indexed query. Ancestors of G are found by asking for nodes with "left" less than 6 (G's own "left") and "right" greater than 6. Works in all databases. Greatly increases performance -- particularly when querying large hierarchies.
Assuming that you are getting the data from a database the first thing that comes to mind is that you are going to be hitting the database n+1 times for as many parents that you have in the database. You should try and get the whole tree structure out in one hit.
Secondly, you seem to get the idea patterns seeing as you appear to be using the repository pattern so you might want to look at IoC. It allows you to inject your dependency on a particular object such as your repository into your class where it is going to be used allowing for easier unit testing.
Thirdly, regardless of where you get your data from, move the structuring of the data into a tree data structure into a service which returns you an object containing all your departments that have already been organised (This basically becomes a DTO). This will help you reduce code duplication.
With anything you need to apply the yagni principle. This basically says that you should only do something if you are going to need it so if the code you have provided above is complete, needs no further work and is functional don't touch it. The same goes with the performance issue of select n+1, if you are not seeing any performance hits don't do anything as it may be premature optimization.
In your edit
DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();
var parentDepartments = entries.Where(d => d.IDParentDepartment == null).ToList();
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID).ToList();
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
You still have a n+1 issue. This is because the data is only retrieved from the database when you call the ToList() or when you iterate over the enumeration. This would be better.
var entries = repo.FindAllDepartments().ToList();
var parentDepartments = entries.Where(d => d.IDParentDepartment == null);
foreach (var parent in parentDepartments)
{
TreeNode node = new TreeNode(parent.Name);
treeView1.Nodes.Add(node);
var children = entries.Where(x => x.IDParentDepartment == parent.ID);
foreach (var child in children)
{
node.Nodes.Add(child.Name);
}
}
That looks ok to me, but think about a collection of hundreds of thousands nodes. The best way to do that is asynchronous loading - please notice, that do don't necassarily have to load all elements at the same time. Your tree view can be collapsed by default and you can load additional levels as the user expands tree's nodes. Let's consider such case: you have a root node containing 100 nodes and each of these nodes contains at least 1000 nodes. 100 * 1000 = 100000 nodes to load - pretty much, istn't it? To reduce the database traffic you can first load your first 100 nodes and then, when user expands one of those, you can load its 1000 nodes. That will save considerable amount of time.
Things that come to mind:
It looks like .ToList() is needless. If you are simply iterating over the returned result, why bother with the extra step?
Move this function into its own thing and out of the event handler.
As other have said, you could get the whole result in one call. Sort by IDParentDepartment so that the null ones are first. That way, you should be able to get the list of departments in one call and iterate over it only once adding children departments to already existing parent ones.
Wrap the TreeView modifications with:
treeView.BeginUpdate();
// modify the tree here.
treeView.EndUpdate();
To get better performance.
Pointed out here by jgauffin
This should use only one (albeit possibly large) call to the database:
Departments.Join(
Departments,
x => x.IDParentDepartment,
x => x.Name,
(o,i) => new { Child = o, Parent = i }
).GroupBy(x => x.Parent)
.Map(x => {
var node = new TreeNode(x.Key.Name);
x.Map(y => node.Nodes.Add(y.Child.Name));
treeView1.Nodes.Add(node);
}
)
Where 'Map' is just a 'ForEach' for IEnumerables:
public static void Map<T>(this IEnumerable<T> source, Action<T> func)
{
foreach (T i in source)
func(i);
}
Note: This will still not help if the Departments table is huge as 'Map' materializes the result of the sql statement much like 'ToList()' does. You might consider Piotr's answer.
In addition to Bronumski and Keith Rousseau answer
Also add the DepartmentID with the nodes(Tag) so that you don't have to re-query the database to get the departmentID