I have a table with two columns, GroupId and ParentId (both are GUIDS). The table forms a hierarchy so I can look for a value in the “GroupId” filed, when I have found it I can look at its ParentId. This ParentId will also appear in the GroupId of a different record. I can use this to walk up the hierarchy tree from any point to the root (root is an empty GUID). What I’d like to do is get a list of records when I know a GroupId. This would be the record with the GroupId and all the parents back to the root record. Is this possible with Linq and if so, can anyone provide a code snippet?
LINQ is not designed to handle recursive selection.
It is certainly possible to write your own extension method to compensate for that in LINQ to Objects, but I've found that LINQ to Entities does not like functionality not easily translated into SQL.
Edit:
Funnily enough, LINQ to Entities does not complain about Matt Warren's take on recursion using LINQ here. You could do:
var result = db.Table.Where(item => item.GroupId == 5)
.Traverse(item => db.Table.Where(parent
=> item.ParentId == parent.GroupId));
using the extension method defined here:
static class LinqExtensions
{
public static IEnumerable<T> Traverse<T>(this IEnumerable<T> source,
Func<T,IEnumerable<T>> selector){
foreach(T item in source){
yield return item;
IEnumerable<T> children = selector(item);
foreach (T child in children.Traverse(selector))
{
yield return child;
}
}
}
Performace might be poor, though.
It's definitely possible with Linq, but you'd have to make a DB call for each level in the heirarchy. Not exactly optimal.
The other respondents are right - performance is going to be pretty bad on this, since you'll have to make multiple round-trips. This will be somewhat dependent on your particular case, however - is your tree deep and will be people be performing this operation often, for instance.
You may be well served by creating a stored procedure that does this (using a CTE), and wiring it up in the Entities Designer to return your particularly defined Entity.
Related
I am using the repository pattern + entity framework for a small project I'm working on. I needed to cut out lazy loading for performance reasons, but now I also need to include child entities in my db fetches. My current (working) solution is this:
protected MediaDbEntities MediaDb { get { return db ?? (db = DatabaseFactory.Get()); } }
public virtual IEnumerable<T> All(string[] childEntities = null)
{
var query = MediaDb.Set<T>();
foreach (var childEntity in childEntities)
{
query.Include(childEntity);
}
return query.ToList();
}
I would really like to explore the use of an aggregate in this case, but don't really know how to apply. I have only used aggregates for sums and arithmetic operations. Anybody have an answer I can learn from?
In this case, you're not actually aggregating any data so trying to create any sort of aggregate is the wrong answer.
There's nothing wrong with your current solution. It makes perfect sense the way it is and I, personally, would leave it the way it is.
I am currently working with Entity Framework 4 on a project that is using Table Per Hierarchy to represent one set of classes using a single table. The way this works is that this table represents states and different states are associated with different other classes.
So you might imagine it to look like this, ignoring the common fields all states share:
InactiveState
has a -> StateStore
ActiveState
has a -> ActivityLocation
CompletedState
has a -> CompletionDate
Each state has a collection of all the InventoryItems that belong to it.
Now each item in my inventory has many states, where the last one in the history is the current state. To save on lists I have a shortcut field that points to the current state of my Inventory:
public class InventoryItem : Entity
{
// whole bunch of other properties
public virtual ICollection<InventoryState> StateHistory { get; set; }
public virtual InventoryState LastState { get; set; }
}
The first problem I am having is when I want to find, for example, all the InventoryItems which are in the Active state.
It turns out Linq-To-Entities doesn't support GetType() so I can't use a statement like InventoryRepository.Get().Where( x => x.LastState.GetType() == someType ). I can use the is operator, but that requires a fixed type so rather than being able to have a method like this:
public ICollection<InventoryItem> GetInventoryItemsByState( Type state )
{
return inventoryRepository.Get().Where( x => x.LastState is state );
}
I have to run some kind of if statement based on the type before I make the Linq query, which feels wrong. The InventoryItem list is likely to get large, so I need to do this at the EF level, I can't pull the whole list into memory and use GetType(), for example.
I have two questions in this situation, connected closely enough that I think they can be combined as they probably reflect a lack of understanding on my part:
Is it possible to find a list of items that share a child table type using Linq To Entities?
Given that I am not using Lazy Loading, is it possible to Include related items for child table types using TPH so that, for example, if I have an InactiveState as the child of my InventoryItem I can preload the StateStore for that InactiveState?
Is it possible to find a list of items that share a child table type
using Linq To Entities?
I don't think it's possible in another way than using an if/switch that checks for the type and builds a filter expression using is T or OfType<T>. You could encapsulate this logic into an extension method for example to have a single place to maintain and a reusable method:
public static class Extensions
{
public static IQueryable<InventoryItem> WhereIsLastState(
this IQueryable<InventoryItem> query, Type state)
{
if (state == typeof(InactiveState))
return query.Where(i => i.LastState is InactiveState);
if (state == typeof(ActiveState))
return query.Where(i => i.LastState is ActiveState);
if (state == typeof(CompletedState))
return query.Where(i => i.LastState is CompletedState);
throw new InvalidOperationException("Unsupported type...");
}
}
To be used like this:
public ICollection<InventoryItem> GetInventoryItemsByState(Type state)
{
return inventoryRepository.Get().WhereIsLastState(state).ToList();
}
I don't know if it would be possible to build the i => i.LastState is XXX expression manually using the .NET Expression API and based on the Type passed into the method. (Would interest me too, to be honest, but I have almost no clue about expression manipulation to answer that myself.)
Given that I am not using Lazy Loading, is it possible to Include
related items for child table types using TPH so that, for example, if
I have an InactiveState as the child of my InventoryItem I can preload
the StateStore for that InactiveState?
I am not sure if I understand that correctly but generally eager loading with Include does not support any filtering or additional operations on specific included children.
One way to circumvent this limitation and still get the result in a single database roundtrip is using a projection which would look like this:
var result = context.InventoryItems
.Select(i => new
{
InventoryItem = i,
LastState = i.LastState,
StateStore = (i.LastState is InactiveState)
? (i.LastState as InactiveState).StateStore
: null
})
.AsEnumerable()
.Select(x => x.InventoryItem)
.ToList();
If the query is a tracked query (which it is in the example above) and the relationships are not many-to-many (they are not in your example) the context will fixup the relationships when the entities are loaded into the context, that is InventoryItem.LastState and InventoryItem.LastState.StateStore (if LastState is of type InactiveState) will be set to the loaded entities (as if they had been loaded with eager loading).
I am looking for a library that would accept a collection of objects and return an indexed data structure that would be optimised for fast querying.
This is probably better illustrated by an example:
public class MyClass
{
public sting Name {get;set;}
public double Number {get;set;}
public ... (Many more fields)
}
var dataStore = Indexer.Parse(myClassCollection).Index(x => x.Name).Index(x => x.Number).Index( x => x.SomeOtherProperty);
var queryResult = dataStore.Where( x => x.Name == "ABC").Where(x => x.Number == 23).Where( x => x.SomeOtherProperty == dateTimeValue);
The idea is that the query on the dataStore will be very fast, of the order of O(log n).
Using dictionaries of dictionaries starts getting complicated when you have more than 2 or 3 fields you want to index.
Is there a library that already exists that does something like this?
What about an object oriented database.
Sterling is a recommended option. It supports LINQ to Object so don't worry about queries and we have used it for a couple of medium projects with good results (it's pretty fast).
You should take a look at RaptorDB as well. Several versions, including a fully embedded version, can be found on CodeProject here.
You could use Lucene.NET which can also run fully in memory (though I'm not sure that's what you'd want). It supports lightning fast retrieval of documents based on field criteria.
So that actually gives you a document database. If you take that one step further, you end up with something like RavenDB (commercial).
I am wondering whether we could achieve this by creating a SortedDictionary for each of the indexed properties.
SortedDictionary<property, List<MyClass>>
Then parsing the Linq expression tree to find out which properties are being queried. We can retrieve the valid keys of the sortedDictionaries, and then loop through these keys to get a List for each sorted dictionary and then use Set operations such as Union() and Intersect() depending on whether the expression tree has OR or AND directives.
Then return the a List matching the search criteria.
If the query includes a property that is not indexed, execute the query with indexed properties first and then use normal Linq to finish it off.
The interesting bit then becomes parsing the expression tree.
Any thoughts on this approach?
To keep my code cleaner I often try to break down parts of my data access code in LINQ to SQL into private sub-methods, just like with plain-old business logic code. Let me give a very simplistic example:
public IEnumerable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IEnumerable<Item> DoSubQuery(IEnumerable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
I'm sure no one's imagination would be stretched by imagining more complex examples with deeper nesting or using results of sets to filter other queries.
My basic question is this: I've seen some significant performance differences and even exceptions being thrown by just simply reorganizing LINQ to SQL code in private methods. Can anyone explain the rules for these behaviors so that I can make informed decisions about how to write efficient, clean data access code?
Some questions I've had:
1) When does passage of System.Linq.Table instace to a method cause query execution?
2) When does using a System.Linq.Table in another query cause execution?
3) Are there limits to what types of operations (Take, First, Last, order by, etc.) can be applied to System.Linq.Table passed a parameters into a method?
The most important rule in terms of LINQ-to-SQL would be: don't return IEnumerable<T> unless you must - as the semantic is unclear. There are two schools of thought beyond that:
if you return IQueryable<T>, it is composable, meaning the where from later queries is combined to make a single TSQL, but as a down-side, it is hard to fully test
otherwise, return List<T> or similar, so it is clear that everything beyond that point is LINQ-to-Objects
Currently, you are doing something in the middle: collapsing it to LINQ-to-Objects (via IEnumerable<T>), but without it being obvious - and keeping the connection open in the middle (again, only a problem because it isn't obvious)
Remove the implicit cast:
public IQueryable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IQueryable<Item> DoSubQuery(IQueryable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
The implicit cast from IQueryable<Item> to IEnumerable<Item> is essentially the same as calling AsEnumerable() on your IQueryable<Item>. There are of course times when you want that, but you should leave things as IQueryable by default, so that the entire query can be performed on the database, rather than merely the GetItemsFromRepository() bit with the rest being done in memory.
The secondary questions:
1) When does passage of System.Linq.Table instace to a method cause query execution?
When something needs a final result, such as Max(), ToList(), etc. that is neither a queryable object, nor a loaded-as-it-goes enumerable.
Note though, that while AsEnumerable() does not cause query execution, it does mean that when execution does happen only that before the AsEnumerable() will be performed against the source datasource, this will then produce an on-demand in-memory datasource against which the rest will be performed.
2) When does using a System.Linq.Table in another query cause
execution?
The same as above. Table<T> implements IQueryable<T>. If you e.g. join two of them together, that won't yet cause anything to be executed.
3) Are there limits to what types of operations (Take,
First, Last, order by, etc.) can be applied to System.Linq.Table
passed a parameters into a method?
Those that are definted by IQueryable<T>.
Edit: Some clarification on the differences and similarities between IEnumerable and IQueryable.
Just about anything you can do on an IQueryable you can do on an IEnumerable and vice-versa, but how it's performed will be different.
Any given IQueryable implementation can be used in linq queries and will have all the linqy extension methods like Take(), Select(), GroupBy and so on.
Just how this is done, depends on the implementation. For example, System.Linq.Data.Table implements those methods by the query being turned into an SQL query, the results of which are turned into a objects on a as-loaded basis. So if mySource is a table then:
var filtered = from item in mySource
where item.ID < 23
select new{item.ID, item.Name};
foreach(var i in filtered)
Console.WriteLine(i.Name);
Gets turned into SQL like:
select id, name from mySourceTable where id < 23
And then an enumerator is created from that such that on each call to MoveNext() another row is read from the results, and a new anonymous object created from it.
On the other hand, if mySource where a List or a HashSet, or anything else that implements IEnumerable<T> but doesn't have its own query engine, then the linq-to-objects code will turn it into something like:
foreach(var item in mySource)
if(item.ID < 23)
yield return new {item.ID, item.Name};
Which is about as efficiently as that code could be done in memory. The results will be the same, but the way to get them, would be different:
Now, since all IQueryable<T> can be converted into the equivalent IEnumerable<T> we could, if we wanted to, take the first mySource (where execution happens in a database) and do the following instead:
var filtered = from item in mySource.AsEnumerable()
where item.ID < 23
select new{item.ID, item.Name};
Here, while there is still nothing executed against the database until we iterate through the results or call something that examines all of those results, once we do so, it's as if we split the execution into two separate steps:
var asEnum = mySource.AsEnumerable();
var filtered = from item in asEnum
where item.ID < 23
select new{item.ID, item.Name};
The implemenatation of the first line would be to execute the SQL SELECT * FROM mySourceTable, and the execution of the rest would be like the linq-to-objects example above.
It's not hard to see how, if the database contained 10 items with an id < 23, and 50,000 items with an id higher, this is now much, much less performant.
As well as offering the explicity AsEnumerable() method, all IQueryable<T> can be implicitly cast to IEnumerable<T>. This lets us do foreach on them and use them with any other existing code that handles IEnumerable<T>, but if we accidentally do it at in inappropriate time, we can make queries much slower, and this is what was happening when your DoSubQuery was defined to take an IEnumerable<DateTimeOffset> and return an IEnumerable<Item>; it implicitly called AsEnumerable() on your IQueryable<DateTimeOffset> and your IQueryable<Item> and caused what could have been performed on the database to be performed in-memory.
For this reason, 99% of the time, we want to keep dealing in IQueryable until the very last moment.
As an example of the opposite though, just to point out that AsEnumerable() and the casts to IEnumerable<T> aren't there out of madness, we should consider two things. The first is that IEnumerable<T> lets us do things that can't be done otherwise, such as joining two completely different sources that don't know about each other (e.g. two different databases, a database and an XML file, etc.)
Another is that sometimes IEnumerable<T> is actually more efficient too. Consider:
IQueryable<IGrouping<string, int>> groupingQuery = from item in mySource select item.ID group by item.Name;
var list1 = groupingQuery.Select(grp => new {Name=grp.Key, Count=grp.Count()}).ToList();//fine
foreach(var grp in groupingQuery)//disaster!
Console.WriteLine(grp.Count());
Here groupingQuery is set up as a queryable that does some grouping, but which hasn't been executed in anyway. When we create list1, then first we create a new IQueryable based on that, and the query engine does it's best to work out what the best SQL for it is, and comes up with something like:
select name, count(id) from mySourceTable group by name
Which is pretty efficiently performed. Then the rows are turned into objects, which are then put into a list.
On the other hand, with the second query, there isn't as natural a SQL conversion for a group by that doesn't perform aggregate methods on all of the non-grouped items, so the best the query engine can come up with is to first do:
select distinct name from mySourceTable,
And then for every name it receives, to do:
select id from mySourceTable where name = '{name found in last query goes here}'
And so on, should this mean 2 SQL queries, or 200,000.
In this case, we're much better working on mySource.AsEnumerable() because here it is more efficient to grab the whole table into memory first. (Even better still would be to work on mySource.Select(item => new {item.ID, item.Name}).AsEnumerable() because then we still only retrieve the columns we care about from the database, and switch to in-memory at that point).
The last bit is worth remembering because it breaks our rule that we should stay with IQueryable<T> as long as possible. It isn't something to worry about much, but it is worth keeping an eye on if you do grouping and find yourself with a very slow query.
I have a couple of areas in an application I am building where it looks like I may have to violate the living daylights out of the DRY (Don't Repeat Yourself) principle. I'd really like to stay dry and not get hosed and wondered if someone might be able to offer me a poncho. For background, I am using C#/.NET 3.51 SP1, Sql Server 2008, and Linq-to-Sql.
Basically, my situations revolve around the following scenario. I need to be able to retrieve either a filtered list of items from virtually any table in my database or I need to be able to retrieve a single item from any table in my database given the id of the primary key. I am pretty sure that the best solutions to these problems will involve a good dose of generics and/or reflection.
Here are the two challenges in a little more depth. (Please forgive the verbosity.)
Given a table name (or perhaps a pluralized table name), I would like to be able to retrieve a filtered list of elements in the table. Specifically, this functionality will be used with lookup tables. (There are approximately 50 lookup tables in this database. Additional tables will frequently be added and/or removed.) The current lookup tables all implement an interface (mine) called IReferenceData and have fields of ID (PK), Title, Description, and IsActive.
For each of these lookup tables, I need to sometimes return a list of all records. Other times I need to only return the active records. Any Linq-to-Sql data context automatically contains a List property for each and every TableName. Unfortunately, I don't believe I can use this in it's raw form because it is unfiltered, and I need to apply a filter on the IsActive property.
One option is to write code similar to the following for all 50 tables. Yuk!!!
public List<AAA> GetListAAA(bool activeOnly)
{
return AAAs.Where(b => b.IsActive == true || b.IsActive == activeOnly).OrderBy(c => c.Title).ToList();
}
This would not be terribly hard, but it does add a burden to maintenance.
Note: It is important that when the list is returned that I maintain the underlying data type. The records in these lookup tables may be modified, and I have to apply the updates appropriately.
For each of my 150 tables, I need to be able to retrieve an individual record (FirstOrDefault or SingleOrDefault) by its primary key id. Again, I would prefer not to write this same code many times. I would prefer to have one method that could be used for all of my tables.
I am not really sure what the best approach would be here. Some possibilities that crossed my mind included the following. (I don't have specific ideas for their implementation. I am simply listing them as food for thought.)
A. Have a method like GetTableNameItemByID (Guid id) on the data context. (Good)
B. Have an extension method like GetItem(this, string tableName, Guid id) on the data context. (Better)
C. Have a Generic method or extension method like GetItem (this, Table, Guid id). (I don't even know if this possible but it would be the cleanest to use.) (Best)
Additional Notes
For a variety of reasons, I have already created a partial class for my data context. It would certainly be acceptable if the methods were included in that partial class either as normal methods or in a separate static class for extension methods.
Since you already have a partial implementation of your data context, you could add:
public IQueryable<T> GetList<T>( bool activeOnly ) where T : class, IReferenceData
{
return this.GetTable<T>()
.Where( b => !activeOnly || b.isActive )
.OrderBy( c => c.Title );
}
Retaining the IQueryable character of the data will defer the execution of the query until you are ready to materialize it. Note that you may want to omit the default ordering or have separate methods with and without ordering to allow you to apply different orderings if you desire. If you leave it as an IQueryable, this is probably more valuable since you can use it with paging to reduce the amount of data actually returned (per query) if you desire.
There's a design pattern for your needs called "Generic Repository" .Using this pattern you'll get an IQueryable instead of a real list of your entities which lets you do some other stuff with your query as you go.The point is to let the business layer gets whatever it needs whenever it needs it in a generic approach.
You can find an example here.
Have you considered using a code generation tool? Have a look at CodeSmith. Using a tool like that or T4 will allow you to generate your filter functions automatically and should make them fairly easy to maintain.
I'm not sure the best link to provide for T4, but you could start with this video.
Would this meet your needs?
public static IEnumerable<T> GetList<T>(this IEnumerable<IReferenceData> items, bool activeOnly)
{
return items.Where(b => b.IsActive == true || b.IsActive == activeOnly).OrderBy(c => c.Title).Cast<T>().ToList();
}
You could use it like this:
IEnumerable<IReferenceData> yourList;
List<DerivedClass> filtered = yourList.GetList<DerivedClass>(true);
To do something like this without demanding interfaces etc, you can use dynamic Expressions; something like:
public static IList<T> GetList<T>(
this DataContext context, bool activeOnly )
where T : class
{
IQueryable<T> query = context.GetTable<T>();
var param = Expression.Parameter(typeof(T), "row");
if(activeOnly)
{
var predicate = Expression.Lambda<Func<T, bool>>(
Expression.Equal(
Expression.PropertyOrField(param, "IsActive"),
Expression.Constant(true,typeof(bool))
), param);
query = query.Where(predicate);
}
var selector = Expression.Lambda<Func<T, string>>(
Expression.PropertyOrField(param, "Title"), param);
return query.OrderBy(selector).ToList();
}