Recursive Linq query - Person > Manager > Dept

Recursive Linq query - Person > Manager > Dept - c#

Say I have a dataset such as this:
PersonId | ManagerId | DepartmentId
========================================
1 null 1
2 1 1
3 1 2
4 2 1
and so on.
I am looking for a Linq query which:
Given a ManagerId and a set of
DepartmentIds will give me all
relevant PersonIds. The query should
return all PersonIds under a manager,
all the way down the tree, not just
those immediately under that manager.
Here's what I've tried so far: http://pastebin.com/zF9dq6wj
Thanks!
Chris.

Using Linq, there's no automatic way to do this (that I've ever heard of) without multiple trips to the database. As such, it's really no different than any other recursive call structure and you can chose between recursive method calls, a while with a System.Collections.Queue (or Stack) object for ids, etc. If your backend database is SQL Server 2008 or higher, you can make use of it's recursive query capabilities, but you'll have to call a sproc to do it as Linq won't be able to make the translation itself.

You cant do recursive queries in Linq2SQL or Linq2Entities. I would suggest writing a View with a CTE and add that to your DataContext file.

Related

How can we retrieve Link list from a table

I have SQL Server table structure like below
Id title nextID
-------------------
1 w 2
2 x 3
3 y 4
4 z null
How can I get the result in the form of LinkedList by using entity framework?
Like This
Id:1
tile:w
nextId:2
nextNode => Id:2
tile:x
nextId:3
nextNode => Id:3
tile:y
nextId:4
nextNode => Id:4
tile:z
nextId:null
nextNode:null

Typically, you would need to get the relevant rows out first, and then form the linked list yourself. Assuming that the data is more complex in reality (i.e. there exist other rows that aren't in the same chain), this makes it harder to query - you'd either need to perform multiple round-trips (as you iteratively discover the next link in the chain), or you'd need to write your own recursive CTE (or a while loop in SQL, if you prefer) to fetch the entire chain in one go. In either scenario, EF isn't really going to go out of its way to help you do this - you're going to have to do that yourself. And by the time you're doing that, I wonder whether it might make more sense (or at least: sense) to switch to hierarchyid as the implementation; as I understand it: this should allow you to query everything in the same hierarchy in a single query (noting that in your case, each level in the hierarchy would only have at most a single child)

SQL Linq .Take() latest 20 rows from HUGE database, performance-wise

I'm using EntityFramework 6 and I make Linq queries from Asp.NET server to a azure sql database.
I need to retrieve the latest 20 rows that satisfy a certain condition
Here's a rough example of my query
using (PostHubDbContext postHubDbContext = new PostHubDbContext())
{
DbGeography location = DbGeography.FromText(string.Format("POINT({1} {0})", latitude, longitude));
IQueryable<Post> postQueryable =
from postDbEntry in postHubDbContext.PostDbEntries
orderby postDbEntry.Id descending
where postDbEntry.OriginDbGeography.Distance(location) < (DistanceConstant)
select new Post(postDbEntry);
postQueryable = postQueryable.Take(20);
IOrderedQueryable<Post> postOrderedQueryable = postQueryable.OrderBy(Post => Post.DatePosted);
return postOrderedQueryable.ToList();
}
The question is, what if I literally have a billion rows in my database. Will that query brutally select millions of rows which meet the condition then get 20 of them ? Or will it be smart and realise that I only want 20 rows hence it will only select 20 rows ?
Basically how do I make this query work efficiently with a database that has a billion rows ?

According to http://msdn.microsoft.com/en-us/library/bb882641.aspx Take() function has deferred streaming execution as well as select statement. This means that it should be equivalent to TOP 20 in SQL and SQL will get only 20 rows from the database.
This link: http://msdn.microsoft.com/en-us/library/bb399342(v=vs.110).aspx shows that Take has a direct translation in Linq-to-SQL.
So the only performance you can make is in database. Like #usr suggested you can use indexes to increase performance. Also storing the table in sorted order helps a lot (which is likely your case as you sort by id).

Why not try it? :) You can inspect the sql and see what it generates, and then look at the execution plan for that sql and see if it scans the entire table
Check out this question for more details
How do I view the SQL generated by the Entity Framework?

This will be hard to get really fast. You want an index to give you the sort order on Id but you want a different (spatial) index to provide you with efficient filtering. It is not possible to create an index that fulfills both goals efficiently.
Assume both indexes exist:
If the filter is very selective expect SQL Server to "select" all rows where this filter is true, then sorting them, then giving you the top 20. Imagine there are only 21 rows that pass the filter - then this strategy is clearly very efficient.
If the filter is not at all selective SQL Server will rather traverse the table ordered by Id, test each row it comes by and outputs the first 20. Imagine that the filter applies to all rows - then SQL Server can just output the first 20 rows it sees. Very fast.
So for 100% or 0% selectivity the query will be fast. In between there are nasty mixtures. If you have that this question requires further thought. You probably need more than a clever indexing strategy. You need app changes.
Btw, we don't need an index on DatePosted. The sorting by DatePosted is only done after limiting the set to 20 rows. We don't need an index to sort 20 rows.

LINQ: Translating a SQL WITH clause to LINQ and Entity Framework

I have an app using Entity Framework. I want to add a tree view listing products, grouped by their categories. I have an old SQL query that will grab all of the products and categories and arrange them into parent nodes and children. I am trying to translate it into LINQ that uses the EF. But the SQL has a WITH sub-query that I am not familiar with using. I have tried using Linqer and LinqPad to sort it out, but they choke on the WITH clause and I am not sure how to fix it. Is this sort of thing possible in LINQ?
Here is the query:
declare #id int
set #id=0
WITH ChildIDs(id,parentid,type,ChildLevel) AS
(
SELECT id,parentid,type,0 AS ChildLevel
FROM dbo.brooks_product
WHERE id = #id
UNION ALL
SELECT e.id,e.parentid,e.type,ChildLevel + 1
FROM dbo.brooks_product AS e
INNER JOIN ChildIDs AS d
ON e.parentid = d.id
WHERE showitem='yes' AND tribflag=1
)
SELECT ID,parentid,type,ChildLevel
FROM ChildIDs
WHERE type in('product','productchild','productgroup','menu')
ORDER BY ChildLevel, type
OPTION (MAXRECURSION 10);
When I run the query, I get data that looks like this (a few thousand rows, truncated here):
ID.....parentid.....type.....ChildLevel
35429..0............menu.....1
49205..0............menu.....1
49206..49205........menu.....2
169999.49206........product..3
160531.169999.......productchild..4
and so on.

The WITH block is a Common Table Expression, and in this case is used to create a recursive query.
This will be VERY difficult in Linq as Linq doesn't play well with recursion. If you need all of the data on one result set that a Stored Procedure would be easier. Another option is to do the recursion in C# (not in Linq but a recursive function) and do multiple round-trips. The performance will not be as good but if you result set is small it may not make much difference (and you will get a better object model).

You may be able to solve this using LINQ to Entities, but it is non-trivial and I suspect it will be very time consuming.
In situations like this, you may prefer to build a SQL View or Table-Valued Function that returns the results for which you're looking. Then import that View or Table-Valued Function into your EF model and you can pull data directly from it using LINQ.
Querying the View in LINQ is no different than querying a table.
To get data from a Table-Valued Function in LINQ, you pass the function's parameters in after the name of the function, like so:
var query = from tvf in _db.MyTableValuedFunction(parameters)
select tvf;
EDIT
As suggested by #thepirat000, Table-Valued Function support is not available in Entity Framework versions prior to version 5. In order to use this functionality, EF must be running with .NET 4.5 or higher.

At the end of the day, I could not get this to work. I ended up writing out a SQL query dynamically and sending that straight to the database. It works fine, and I am not relying on any direct user input so there is no chance of SQL injection. But it seems so old school! For the rest of my program I am using EF and LINQ.
Thanks for the replies!

How can I conditionally add where clauses and filter children in a single linq query?

I'm using entity framework and building up a linq query so the query is executed at the database to minimize data coming back and the query can have some search criteria which is optional and some ordering which is done every time. I am working with parents and children (the mummy and daddy type). The filter I am trying to implement is for age of the children.
So if I have some data like so...
parent 1
- child[0].Age = 5
- child[1].Age = 10
parent 2
- child[0].Age = 7
- child[1].Age = 23
...and I specify a minimum age of 8, my intended result to display is...
parent 1
- child[1].Age = 10
parent 2
- child[1].Age = 23
...and if I specify a minimum age of 15 I intend to display...
parent 2
- child[1].Age = 23
I can re-create my expected result with this horrible query (which I assume is actually doing more than one query):
var parents = context.Parents;
if(minimumChildAge.HasValue)
{
parents = parents.Where(parent => parent.Children.Any(child => child.Age >= minimumChildAge.Value));
foreach(var parent in parents)
{
parent.Children = parent.Children.Where(child => child.minimumChildAge.Value >= mimumumChildAge);
}
}
parents = parents.OrderBy(x => x.ParentId).Take(50);
So I tried the other method instead...
var query = from parent in context.Parents
select parent;
if (minimumChildAge.HasValue)
query = from parent in query
join child in context.Children
on parent.ParentId equals child.ParentId
where child.Age >= minimumChildAge.Value
select parent;
query = query.OrderBy(x => x.ParentId).Take(50);
When I run this in linqpad the query generated looks good. So my question...
Is this the correct way of doing this? Is there a better way? It seems a bit funny that if I now specified a maximum age that I would be writing the same joins and hoping that entity framework works it out. In addition, how does this impact lazy loading? I expect only the children which match the criteria to be returned. So when I do parent.Children does entity framework know that it just queried these and its working on a filtered collection?

Assuming your context is backed by an entity framework database or similar, then yes, your first option is going to do more than one SQL query. When you begin executing the foreach it will run a SQL query to get the parent (since you've forced enumeration on the query). Then, for each attempt to populate the Children property of a single parent object it will make another database call.
The second form should only produce a single SQL query; it will have a ton of redundant data but it will use JOIN statements to bring back all of the parent and child data in a single SQL call, then enumerate through it and populate the data on the client side as needed.
A rule of thumb I tend to follow is that, if you have fewer than 4 nested tables in your query, try to run it all at once. Both SQL and Entity Framework's query parsers seem to be very, very efficient when producing joins at that level.
If you get much beyond that, the SQL queries that EF can produce may get messy, and SQL itself (assuming MSSQL) gets less effective when you have 5+ joins on a single query. There's no hard and fast limit, because it depends on a number of specific factors, but if I find myself needing very deep nesting I tend to break it up into smaller LINQ queries and recombine them client-side.
(Side note: you can reproduce your second query in method syntax easily enough, since that's what the compiler is going to end up doing anyway, by using the Join method, but the syntax for that can get very complex; I typically go with query syntax for anything more complex then a single method call.)

LINQ and selection rows from big database

I have some database ang now it contains a table with about 100 rows. But in future it will have not 100 but 1 000 000+ rows and I have to be careful with my web application I'm developing now.
Problem is next: at web page I need to create paged list what will show records to user. And here is a sample of code that I plan to use
public IQueryable<MyTable> GetRows(int from, int to)
{
var queryRes = (from row in SomeDataContext.MyTable
order by row.id
select row).AsQueriable();
return queryRes.Take(to).Skip(from);
}
It is only sample of code. I did not run it.
But question is what will go on in this case? I see tow scenarios
It will load all rows from database and at server side and records in range from 'from' to 'to' will be returned. Other will be ignored. In this case my application will have big troubles. Imagine load 1 000 000 rows from database every time. It will be disaster.
It will construct SQL request what will return only rows I need without loading others. That's exactly what I need.
I think that it will be 2 scenario but I'm not sure and can't check it. Am I correct?

As a side-note, you don't have to call AsQueryable. It is enough to do
var queryRes = SomeDataContext.MyTable.OrderBy(r => r.Id);
return queryRes.Take(to).Skip(from);
And to answer your question - scenario 2 will be executed. You can always check the generated SQL by using the SQL Server Profiler, but in case you are using Entity Framework, you can even do queryRes.ToString(). And as #Aron correctly pointed out - the query will be actually executed against the database only when enumerating the results (e.g. calling queryRes.ToList()).
These questions address the issue of looking up the SQL code in more detail:
How to view generated SQL from Entity Framework?
exact sql query executed by Entity Framework

Strictly speaking, neither 1 nor 2 is correct. Running the code DOES NOT hit the database. It constructs an expression tree. The calling code can still modify the expression tree further without hitting the database.
With the IQueryable interface no SQL is run. It is at the point when you call IEnumerable.GetEnumerator() that the underlying Linq Provider converts the WHOLE expression into a query. In this case a SQL query, and then run it.
So for example, with this code. You could have
void Main()
{
var foo = from x in GetRows(10, 10)
where x.Id > 1000
select x;
foreach(var f in foo)
{
//Stuff
}
}
The sql that is actually run will actually be closer to
SELECT a,b,c FROM
(SELECT a,b,c, ROW_NUMBER() OVER (ORDER BY ...) as row_number
FROM Table
WHERE id > 1000) t0
WHERE to.row_number BETWEEN 10 and 20;
To be honest you are going about this wrong. You don't need a GetRows method. I would directly call the Linq query when constructing the table itself. You should take a look at the IRepository pattern that MVC scaffolding uses.
Finally if this is meant to be called as a WebQuery for AJAX I would look at the two OData implementations in .net (WCF Data Services and WebAPI OData).

You are right.
The 2. scenario is what will happen. When the query is eventuallty exectuted.
I Would sugges to reverse the Take - Skip, so you start by Skip
queryRes.Skip(from).Take(to)
Debuggen this method will not make any calls to the database. It just returns the query - not the resualt.
If you want to test exactly what will happen, try download LinqPad - it is a great to for demystifying linq queries.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.