EF Query to Get Union of All Child Collections

EF Query to Get Union of All Child Collections - c#

Assuming I have an Entity Framework 4.2 class like this:
class Company
{
public int ID { get; set; }
public ICollection<Employee> Employees { get; set; }
}
And I have collection like this:
public ICollection<Company> Companies { get; set; }
What's an efficient way to build a list of all employees in the entire Companies collection? I do not need to worry about duplicates.
Note: I'm trying to do it without an Entity Framework context--just using the Companies collection. However, if this entails a huge performance hit, I could get a context if necessary.

I'm not sure what you're trying to achieve, but to query the data you'll want to use the DbContext.
You can configure you context to automatically load the related entities by setting LazyLoadingEnabled to true, or explicitly, i.e. when querying the data, include the path to the employees, using the Include extension provided with EF > 4.0.
As soon as all companies are there you could linq it using the SelectMany method. I haven't checked, but AFAIK, LINQ to SQL will execute the query as one bunch on demand.
If you insist not to expose your DbContext or if you use a different underlying context, I would suggest you to wrap it up in a repository that does the SelectMany behind the since, but your friend are Include and SelectMany.

Related

Entity Framework LINQ SQL Query Performance

Hello everyone I'm working on an API that returns a dish with its restaurant details from a database that has restaurants and their dishes.
I'm wondering if the following makes the query any efficient by converting the first, to second:
from res in _context.Restaurant
join resdish in _context.RestaurantDish
on res.Id equals resdish.RestaurantId
where resdish.RestaurantDishId == dishId
Second:
from resdish in _context.RestaurantDish
where resdish.RestaurantDishId == dishId
join res in _context.Restaurant
on resdish.RestaurantId equals res.Id
The reason why I'm debating this is because I feel like the second version filters to the single restaurant dish, then joining it, rather than joining all dishes then filtering.
Is this correct?

You can use a profiler on your database to capture the SQL in both cases, or inspect the SQL that EF generates and you'll likely find that the SQL in both cases is virtually identical. It boils down to how the reader (developers) interprets the intention of the logic.
As far as building efficient queries in EF goes, EF is an ORM meaning it offers to map between an object-oriented model and a relational data model. It isn't just an API to enable translating Linq to SQL. Part of the power for writing simple and efficient queries is through the use of navigation properties and projection. A Dish will be considered the property of a particular Restaurant, while a Restaurant has many Dishes on its menu. This forms a One-to-Many relationship in the database, and navigation properties can map this relationship in your object model:
public class Restaurant
{
[Key]
public int RestaurantId { get; set; }
// ... other fields
public virtual ICollection<Dish> Dishes { get; set; } = new List<Dish>();
}
public class Dish
{
[Key]
public int DishId { get; set; }
//[ForeignKey(nameof(Restaurant))]
//public int RestaurantId { get; set; }
public virtual Restaurant Restaurant { get; set; }
}
The FK propery for the Restaurant ID is optional and can be configured to use a Shadow Property. (One that EF knows about and generates, but isn't exposed in the Entity) I recommend using shadow properties for FKs mainly to avoid 2 sources of truth for relationships. (dish.RestaurantId and dish.Restaurant.RestaurantId) Changing the FK does not automatically update the relationship unless you reload the entity, and updating the relationship does not automatically update the FK until you call SaveChanges.
Now if you wanted to get a particular dish and it's associated restaurant:
var dish = _context.Dishes
.Include(d => d.Restaurant)
.Single(d => d.DishId == dishId);
This fetches both entities. Note that there is no need now to manually write Joins like you would with SQL. EF supports Join, but it should only be used in very rare cases where a schema isn't properly normalized/relational and you need to map loosely joined entities/tables. (Such as a table using an "OwnerId" that could join to a "This" or a "That" table based on a discriminator such as OwnerType.)
If you leave off the .Include(d => d.Restaurant) and have lazy loading enabled on the DbContext, then EF would attempt to automatically load the Restaurant if and when the first attempt of the code to access dish.Restaurant. This provides a safety net, but can incur some steep performance penalties in many cases, so it should be avoided or treated as a safety net, not a crutch.
Eager loading works well when dealing with single entities and their related data where you will need to do things with those relationships. For instance if I want to load a Restaurant and review, add/remove dishes, or load a Dish and possibly change the Restaurant. However, eager loading can come at a significant cost in how EF and SQL provides that related data behind the scenes.
By default when you use Include, EF will add an INNER or LEFT join between the associated tables. This creates a Cartesian Product between the involved tables. If you have 100 restaurants that have an average of 30 dishes each and select all 100 restaurants eager loading their dishes, the resulting query is 3000 rows. Now if a Dish has something like Reviews and there are an average of 5 reviews per dish and you eager load Dishes and Reviews, that would be a resultset of every column across all three tables with 15000 rows in total. You can hopefully appreciate how this can grow out of hand pretty fast. EF then goes through that Cartesian and populates the associated entities in the object graph. This can lead to questions about why "my query runs fast in SSMS but slow in EF" since EF can have a lot of work to do, especially if it has been tracking references from restaurants, dishes, and/or reviews to scan through and provide. Later versions of EF can help mitigate this a bit by using query splitting so instead of JOINs, EF can work out to fetch the related data using multiple separate SELECT statements which can execute and process a fair bit faster, but it still amounts to a lot of data going over the wire and needing memory to materialize to work with.
Most of the time though, you won't need ALL rows, nor ALL columns for each and every related entity. This is where Projection comes in such as using Select. When we pull back our list of restaurants, we might want to list the restaurants in a given city along with their top 5 dishes based on user reviews. We only need the RestaurantId & Name to display in these results, along with the Dish name and # of positive reviews. Instead of loading every column from every table, we can define a view model for Restaurants and Dishes for this summary View, and project the entities to these view models:
public class RestaurantSummaryViewModel
{
public int RestaurantId { get; set; }
public string Name { get; set; }
public ICollection<DishSummaryViewModel> Top5Dishes { get; set; } = new List<DishSummaryViewModel>();
}
public class DishSummaryViewModel
{
public string Name { get; set; }
public int PositiveReviewCount {get; set; }
}
var restaurants = _context.Restaurants
.Where(r => r.City.CityId == cityId)
.OrderBy(r => r.Name)
.Select(r => new RestaurantSummaryViewModel
{
RestaurantId = r.RestaurantId,
Name = r.Name,
Top5Dishes = r.Dishes
.OrderByDescending(d => d.Reviews.Where(rv => rv.Score > 3).Count())
.Select(d => new DishSummaryViewModel
{
Name = d.Name,
PositiveReviewCount = d.Reviews.Where(rv => rv.Score > 3).Count()
}).Take(5)
.ToList();
}).ToList();
Notice that the above Linq example doesn't use Join or even Include. Provided you follow a basic set of rules to ensure that EF can work out what you want to project down to SQL you can accomplish a fair bit producing far more efficient queries. The above statement would generate SQL to run across the related tables but would only return the fields needed to populate the desired view models. This allows you to tune indexes based on what data is most commonly needed, and also reduces the amount of data going across the wire, plus memory usage on both the DB and app servers. Libraries like Automapper and it's ProjectTo method can simplify the above statements even more, configuring how to select into the desired view model once, then replacing that whole Select( ... ) with just a ProjectTo<RestaurantSummaryViewModel>(config) where "config" is a reference to the Automapper configuration where it can resolve how to turn Restaurants and their associated entities into the desired view model(s).
In any case it should give you some avenues to explore with EF and learning what it can bring to the table to produce (hopefully:) easy to understand, and efficient query expressions.

nhibernate force separate queries instead of join

I'm new to nhibernate and I couldn't figure this one out.
I have an entity similiar to below class;
public class MotherCollection
{
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual ISet<Class1> Collection1 { get; set; }
public virtual ISet<Class2> Collection2 { get; set; }
public virtual ISet<Class3> Collection3 { get; set; }
public virtual ISet<Class4> Collection4 { get; set; }
}
There are numerous one to many relationships to other entities.
I configure this relation with below mappings;
HasMany(d => d.Collection1).KeyColumn("McId");
HasMany(d => d.Collection2).KeyColumn("McId");
HasMany(d => d.Collection3).KeyColumn("McId");
HasMany(d => d.Collection4).KeyColumn("McId");
Child classes are configured similiary;
References(c1=>c1.MotherCollection).Column("McId");
and so on.
When I query this entity from db, fetching all relationships, I get a huge query similar to this one :
SELECT * FROM MotherCollection mc
JOIN c1 on mc.Id=c1.mcId
JOIN c2 on mc.Id=c2.mcId
JOIN c3 on mc.Id=c3.mcId
JOIN c4 on mc.Id=c4.mcId
this query causes alot of duplicate rows and takes alot of time to execute.
I want nhibernate to somehow seperate this query to individual SELECT queries, like below
SELECT * FROM MotherCollection Where Id = #Id
SELECT * FROM c1 Where mcId = #Id
and such. A bit similar to how it happens when the collection is lazy loaded.
I managed to achive this behaviour by setting my desired collections as lazy, and accessing their First() property just before it exits my datalayer. However, I'm guessing there must be a more elegant way of doing this in Nhibernate.
I've tried queries similar to this:
var data = session.QueryOver<DataSet>().Fetch(d=>d.Collection1).Eager.Fetch(d=>d.Collection2).Eager....
Thank you.

You should issue 4 separate queries, each one fetching one collection.
And you should use session.Query. QueryOver is an older way of doing it. To use it add using NHibernate.Linq. I usually use the following extension method to prefetch collections:
static public void Prefetch<T>(this IQueryable<T> query)
{
// ReSharper disable once ReturnValueOfPureMethodIsNotUsed
query.AsEnumerable().FirstOrDefault();
}
And then use:
var data = session.Query<DataSet>().Fetch(d=>d.Collection1).ToList();
session.Query<DataSet>().Fetch(d=>d.Collection2).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection3).Prefetch();
session.Query<DataSet>().Fetch(d=>d.Collection4).Prefetch();
Make sure to run the 4 queries before accessing the collections. That way when you access them they will all be initialized already. If you use regular lazy loading you will be initializing one collection for one object at a time.

This is called lazy/eager loading. You have two choices to select from:
1. Lazy load with multiple queries:
This will generate multiple queries. While lazy loading, NHibernate first generates the query to get all MotherCollection data and only IDs (without data) from any dependent tables. Then it generates new query with WHERE clause for Primary Key on dependent table. So, this leads to famous N+1 issue.
With this, referenced collections will NOT be filled by default. Those will get filled up when you first access them while ISession is still valid. This is similar to calling First() as you mentioned in your question.
Look at your HasMany configuration; you have not mentioned LazyLoad but it is default. So, with your current mapping, this is what is happening.
This is recommended by NHibernate.
2. Eager load with single complex query:
If you want to avoid multiple queries and retrieve all the data in one go, try something like following:
HasMany(d => d.Collection1).KeyColumn("McId").Inverse().Not.LazyLoad().Fetch.Join();
With this, referenced collections will be filled up (if data present in database) automatically.
Please note that this is against the NHibernate recommendation. Refer this link.
Instead, we keep the default behavior, and override it for a
particular transaction, using left join fetch in HQL. This tells
NHibernate to fetch the association eagerly in the first select, using
an outer join. In the ICriteria query API, you would use
SetFetchMode(FetchMode.Join).
If you ever feel like you wish you could change the fetching strategy
used by Get() or Load(), simply use a ICriteria query, for
example:
User user = (User) session.CreateCriteria<User>()
.SetFetchMode("Permissions", FetchMode.Join)
.Add( Expression.Eq("Id", userId) )
.UniqueResult();
A completely different way to avoid problems with N+1 selects is to
use the second-level cache.
Duplicate rows and Performance
This is actually a different problem. There are multiple ways to handle this; but it will need additional inputs from you. Before that, you should choose one option from above two. Thus this deserves a new question to be asked.
Refer this answer: https://stackoverflow.com/a/30748639/5779732

Entity Framework 6: virtual collections lazy loaded even explicitly loaded on a query

I have a problem with EF6 when trying to optimize the queries. Consider this class with one collection:
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
}
As you might know, with Lazy Loading I have this n+1 problem, when EF tries to get all the Countries, for each client.
I tried to use Linq projections; for example:
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries
}).ToList().Select(data =>
{
data.client.Countries = data.Countries; // Here is the problem
return data.client;
}).ToList();
Here I'm using two selects: the first for the Linq projection, so EF can create the SQL, and the second to map the result to a Client class. The reason for that is because I'm using a repository interface, which returns List<Client>.
Despite the query is generated with the Countries in it, EF still is using Lazy Loading when I try to render the whole information (the same n+1 problem). The only way to avoid this, is to remove the virtual accessor:
public class Client
{
... a lot of properties
public List<Country> Countries { get; set; }
}
The issue I have with this solution is that we still want to have this property as virtual. This optimization is only necessary for a particular part of the application, whilst on the other sections we want to have this Lazy Loading feature.
I don't know how to "inform" EF about this property, that has been already lazy-loaded via this Linq projection. Is that possible? If not, do we have any other options? The n+1 problems makes the application to take several seconds to load like 1000 rows.
Edit
Thanks for the responses. I know I can use the Include() extension to get the collections, but my problem is with some additional optimizations I need to add (I'm sorry for not posting the complete example, I thought with the Collection issue would be enough):
public class Client
{
... a lot of properties
public virtual List<Country> Countries { get; set; }
public virtual List<Action> Actions { get; set; }
public virtual List<Investment> Investments { get; set; }
public User LastUpdatedBy {
get {
if(Actions != null) {
return Actions.Last();
}
}
}
}
If I need to render the clients, the information about the last update and the number of investments (Count()), with the Include() I practically need to bring all the information from the database. However, if I use the projection like
return _dbContext.Clients
.Select(client => new
{
client,
client.Countries,
NumberOfInvestments = client.Investments.Count() // this is translated to an SQL query
LastUpdatedBy = client.Audits.OrderByDescending(m => m.Id).FirstOrDefault(),
}).ToList().Select(data =>
{
// here I map back the data
return data.client;
}).ToList();
I can reduce the query, getting only the required information (in the case of LastUpdatedBy I need to change the property to a getter/setter one, which is not a big issue, as its only used for this particular part of the application).
If I use the Select() with this approach (projection and then mapping), the Include() section is not considered by EF.

If i understand correctly you can try this
_dbContext.LazyLoading = false;
var clientWithCountres = _dbContext.Clients
.Include(c=>c.Countries)
.ToList();
This will fetch Client and only including it Countries. If you disable lazy-loading the no other collection will load from the query. Unless you are specifying a include or projection.
FYI : Projection and Include() doesn't work together see this answer
If you are projection it will bypass the include.
https://stackoverflow.com/a/7168225/1876572

don't know what you want to do, you are using lambda expression not linq, and your second select it's unnecessary.
data.client is client, data.Countries is client.Countries, so data.client.Countries = data.Countries alway true.
if you don't want lazy load Countries, use _dbContext.Clients.Include("Countries").Where() or select ().

In order to force eager loading of virtual properties you are supposed to use Include extension method.
Here is a link to MSDN https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113).aspx.
So something like this should work:
return _dbContext.Clients.Include(c=>c.Countries).ToList();

Im not 100% sure but I think your issue is that you are still maintaining a queryable for your inner collection through to the end of the query.
This queryable is lazy (because in the model it was lazy), and you havent done anything to explain that this should not be the case, you have simply projected that same lazy queryable into the result set.
I cant tell you off the top of my head what the right answer here is but I would try things around the following:
1 use a projection on the inner queriable also eg
return _dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.Select(c=>c)// or a new Country
})
2 Put the include at the end of the query (Im pretty sure include applies to the result not the input. It definitally doesnt work if you put it before a projection) eg:
_dbContext.Clients
.Select(client => new
{
client,
client.Countries
}.Include(c=>c.Countries)`
3 Try specifying the enumeration inside the projection eg:
_dbContext.Clients
.Select(client => new
{
client,
Countries = client.Countries.AsEnumerable() //perhaps tolist if it works
}`
I do want to caviat this by saying that I havent tried any of the above but I think this will set you on the right path.
A note on lazy loading
IMO there are very few good use cases for lazy loading. It almost always causes too many queries to be generated, unless your user is following a lazy path directly on the model. Use it only with extreme caution and IMO not at all in request response (eg web) apps.

A quick question about aggregate relational objects in MVC

I'm reading through Pro ASP.NET MVC 3 Framework that just came out, and am a bit confused about how to handle the retrieval of aggregate objects from a data store. The book uses Entity framework, but I an considering using a mini-ORM (Dapper or PetaPoco). As an example, the book uses the following objects:
public class Member {
public string name { get; set; }
}
public class Item {
public int id { get; set; }
public List<Bid> bids { get; set; }
}
public class Bid {
public int id { get; set; }
public Member member { get; set; }
public decimal amount { get; set; }
}
As far as I'm into the book, they just mention the concept of aggregates and move on. So I am assuming you would then implement some basic repository methods, such as:
List<Item> GetAllItems()
List<Bid> GetBidsById(int id)
GetMemberById(int id)
Then, if you wanted to show a list of all items, their bids, and the bidding member, you'd have something like
List<Item> items = Repository.GetAllItems();
foreach (Item i in items) {
i.Bids = Repository.GetBidsById(i.id);
}
foreach (Bid b in items.Bids) {
b.Member = Repository.GetMemberById(b.id);
}
If this is correct, isn't this awfully inefficient, since you could potentially issue thousands of queries in a few seconds? In my non-ORM thinking mind, I would have written a query like
SELECT
item.id,
bid.id,
bid.amount,
member.name
FROM
item
INNER JOIN bid
ON item.id = bid.itemId
INNER JOIN member
ON bid.memberId = member.id
and stuck it in a DataTable. I know it's not pretty, but one large query versus a few dozen little ones seems a better alternative.
If this is not correct, then can someone please enlighten me as to the proper way of handling aggregate retrieval?

If you use Entity Framework for you Data Access Layer, read the Item entity and use the .Include() fluent method to bring the Bids and Members along for the ride.

An aggregate is a collection of related data. The aggregate root is the logical entry point of that data. In your example, the aggregate root is an Item with Bid data. You could also look at the Member as an aggregate root with Bid data.
You may use your data access layer to retrieve the object graph of each aggregate and transforming the data for your use in the view. You may even ensure you eager fetch all of the data from the children. It is possible to transform the data using a tool like AutoMapper.
However, I believe that it is better to use your data access layer to project the domain objects into the data structure you need for the view, whether it be ORM or DataSet. Again, to use your example, would you actually retrieve the entire object graph suggested? Do I need all items including their bids and members? Or do I need a list of items, number of bids, plus member name and amount for the current winning bid? When I need more data about a particular item, I can go retrieve that when the request is made.
In short, your intuition was spot-on that it is inefficient to retrieve all that data, when a projection would suffice. I would just urge you to limit the projection even further and retrieve only the data you require for the current view.

This would be handled in different ways depending on your data access strategy. If you were using NHibernate or Entity Framework, you can have the ORM automatically populate these properties for you eagerly, lazy load them, etc. Entity Framework calls them "Navigation Properties", I'm not sure that NHibernate has a specific name for these "child properties" or "child collections".
In old-school ADO.NET, you might do something like create a stored procedure that returns multiple result sets (one for the main object and other result sets for your child collections or related objects), which would let you avoid calling the database multiple times. You could then iterate over the results sets and hydrate your object with all its relationships with one database call, and inside of a single repository method.

Where ever in your system you do the data retrieval, you would program your orm of choice to do an eager fetch of the related objects (aggregates).

Using what kind of data access method depends on your project.
Convenience vs performance.
Using EF or Linq to SQL really boosts the coding speed. When talking about performance, you really should care about every sql statement you deliver to the database.
No ORM can do both.

You can treat the read (query) and the write (command) side of the model separately.
When you want to mutate the state of your Aggregate, you load the Aggregate Root (AR) via a repository, mutate its state using the intention revealing public methods on the AR, then save the AR with the repository back again.
On the read side however, you can be as flexible as you want. I don't know Entity Framework, but with NHibernate you could use the QueryOver API to generate flexible queries to populate DTO's designed to be consumed by the client, whether it be a service or a View. If you want more performance you could go with Dapper. You could even use Stored Procs that projects itself to a DTO, that way you can be as efficient in the DB layer as possible.

Is this Repository pattern efficient with LINQ-to-SQL?

I'm currently reading the book Pro Asp.Net MVC Framework. In the book, the author suggests using a repository pattern similar to the following.
[Table(Name = "Products")]
public class Product
{
[Column(IsPrimaryKey = true,
IsDbGenerated = true,
AutoSync = AutoSync.OnInsert)]
public int ProductId { get; set; }
[Column] public string Name { get; set; }
[Column] public string Description { get; set; }
[Column] public decimal Price { get; set; }
[Column] public string Category { get; set; }
}
public interface IProductsRepository
{
IQueryable<Product> Products { get; }
}
public class SqlProductsRepository : IProductsRepository
{
private Table<Product> productsTable;
public SqlProductsRepository(string connectionString)
{
productsTable = new DataContext(connectionString).GetTable<Product>();
}
public IQueryable<Product> Products
{
get { return productsTable; }
}
}
Data is then accessed in the following manner:
public ViewResult List(string category)
{
var productsInCategory = (category == null) ? productsRepository.Products : productsRepository.Products.Where(p => p.Category == category);
return View(productsInCategory);
}
Is this an efficient means of accessing data? Is the entire table going to be retrieved from the database and filtered in memory or is the chained Where() method going to cause some LINQ magic to create an optimized query based on the lambda?
Finally, what other implementations of the Repository pattern in C# might provide better performance when hooked up via LINQ-to-SQL?

I can understand Johannes' desire to control the execution of the SQL more tightly and with the implementation of what i sometimes call 'lazy anchor points' i have been able to do that in my app.
I use a combination of custom LazyList<T> and LazyItem<T> classes that encapsulate lazy initialization:
LazyList<T> wraps the IQueryable functionality of an IList collection but maximises some of LinqToSql's Deferred Execution functions and
LazyItem<T> will wrap a lazy invocation of a single item using the LinqToSql IQueryable or a generic Func<T> method for executing other code deferred.
Here is an example - i have this model object Announcement which may have an attached image or pdf document:
public class Announcement : //..
{
public int ID { get; set; }
public string Title { get; set; }
public AnnouncementCategory Category { get; set; }
public string Body { get; set; }
public LazyItem<Image> Image { get; set; }
public LazyItem<PdfDoc> PdfDoc { get; set; }
}
The Image and PdfDoc classes inherit form a type File that contains the byte[] containing the binary data. This binary data is heavy and i might not always need it returned from the DB every time i want an Announcement. So i want to keep my object graph 'anchored' but not 'populated' (if you like).
So if i do something like this:
Console.WriteLine(anAnnouncement.Title);
..i can knowing that i have only loaded from by db the data for the immediate Announcement object. But if on the following line i need to do this:
Console.WriteLine(anAnnouncement.Image.Inner.Width);
..i can be sure that the LazyItem<T> knows how to go and get the rest of the data.
Another great benefit is that these 'lazy' classes can hide the particular implementation of the underlying repository so i don't necessarily have to be using LinqToSql. I am (using LinqToSql) in the case of the app I'm cutting examples from, but it would be easy to plug another data source (or even completely different data layer that perhaps does not use the Repository pattern).
LINQ but not LinqToSql
You will find that sometimes you want to do some fancy LINQ query that happens to barf when the execution flows down to the LinqToSql provider. That is because LinqToSql works by translating the effective LINQ query logic into T-SQL code, and sometimes that is not always possible.
For example, i have this function that i want an IQueryable result from:
private IQueryable<Event> GetLatestSortedEvents()
{
// TODO: WARNING: HEAVY SQL QUERY! fix
return this.GetSortedEvents().ToList()
.Where(ModelExtensions.Event.IsUpcomingEvent())
.AsQueryable();
}
Why that code does not translate to SQL is not important, but just believe me that the conditions in that IsUpcomingEvent() predicate involve a number of DateTime comparisons that simply are far too complicated for LinqToSql to convert to T-SQL.
By using .ToList() then the condition (.Where(..) and then .AsQueryable() i'm effectively telling LinqToSql that i need all of the .GetSortedEvents() items even tho i'm then going to filter them. This is an instance where my filter expression will not render to SQL correctly so i need to filter it in memory. This would be what i might call the limitation of LinqToSql's performance as far as Deferred Execution and lazy loading goes - but i only have a small number of these WARNING: HEAVY SQL QUERY! blocks in my app and i think further smart refactoring could eliminate them completely.
Finally, LinqToSql can make a fine data access provider in large apps if you want it to. I found that to get the results i want and to abstract away and isolate certain things i've needed to add code here and there. And where i want more control over the actual SQL performance from LinqToSql, i've added smarts to get the desired results. So IMHO LinqToSql is perfectly ok for heavy apps that need db query optimization provided you understand how LinqToSql works. My design was originally based on Rob's Storefront tutorial so you might find it useful if you need more explanation about my rants above.
And if you want to use those lazy classes above, you can get them here and here.

Is this an efficient means of
accessing data? Is the entire table
going to be retrieved from the
database and filtered in memory or is
the chained Where() method going to
cause some LINQ magic to create an
optimized query based on the lambda?
It is efficient, if you wish to say so. The Repository exposes an IQueryable inteface, which basically represents any LINQ Data Provider (in this case Linq2Sql).
Queries are executed the moment you start iterating over the result.
IQueryable therefore supports query composition. You can add any .Where() or .GroupBy() or .OrderBy() call to a query and it will be statisfied by the database.
If you put an enumeration in your query, such as .ToList(), everything after that will happen in memory (LinqToObjects).
But I think the repository implementation is useless. I want my repository to control query execution, which is impossible when exposing IQueryable.

Yes linq2sql will generate magic to make it more efficient. It depends on you using the IQueryable interface. If you want to check clamp the SQL profiler on and you can see it generate the appropriate query.
I would recommend introducing a service layer to abstract away your dependancy on linq2sql.

I've also read that book recently and this is the SQL generated when I ran the sample code:
SELECT [t1].[Category]
FROM ( SELECT DISTINCT [t0].[Category]
FROM [Products] AS [t0] ) AS [t1] ORDER BY [t1].[Category]
I don't think you can write anything more efficient given that database. However in most real databases your Categories would be in a separate table to keep things DRY.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

EF Query to Get Union of All Child Collections - c#

Related

Entity Framework LINQ SQL Query Performance

nhibernate force separate queries instead of join

Entity Framework 6: virtual collections lazy loaded even explicitly loaded on a query

A quick question about aggregate relational objects in MVC

Is this Repository pattern efficient with LINQ-to-SQL?

Categories

Resources