Is this approach good for managing the DbContext?

Is this approach good for managing the DbContext? - c#

I have a repository that asks for a DbContext in its constructor, and then I used ninject to solve this dependency, and I set the object scope to be InRequestScope as it means instantiating an object per HTTP request, but I'm not sure that when an HTTP request actually happens? is it when the app is being loaded? or it happens when we call SaveChanges()?
My approach for managing the DbContext is like that, I have a repository asking for a context as I said, and then the controller asks for this repository in its constructor:
public class PageGroupsController : Controller
{
IGenericRepository<PageGroup> _repository;
public PageGroupsController(IGenericRepository<PageGroup> repository)
{
_repository = repository;
}
// GET: Admin/PageGroups
public ActionResult Index()
{
return View(_repository.Get());
}
}
And the repository:
public class GenericRepository<TEntity> : IGenericRepository<TEntity> where TEntity : class
{
private DbContext _context;
public GenericRepository(DbContext context)
{
_context = context;
}
public IEnumerable<TEntity> Get()
{
return _context.Set<TEntity>().ToList();
}
}
And the NinjectWebCommon.cs which is where I solve the dependencies:
private static void RegisterServices(IKernel kernel)
{
kernel.Bind<DbContext>().To<MyCmsContext>().InRequestScope();
kernel.Bind<IGenericRepository<PageGroup>>().To<GenericRepository<PageGroup>>();
}
Is this approach good at all? I didn't want to use using {var db = new DbContext} all over the place in my controllers, and I didn't want to make a single context for the whole app as well. is this approach equal to the using approach(I mean querying what we need in a using block)? but with less coupling?

Each time a controller action is called from any web client, that is a request. So when someone visits your site and visits /Pagegroups/Index resolved through routing, that is a request. When you do a Form.Submit from the client, that is a request, make an Ajax call, that is a request.
Do you want the DbContext scoped to be constructed for each request? Absolutely, and no "longer" than a request. For simple applications, using using() within actions is perfectly fine, but it does add a bit of boilerplate code repeating it everywhere. In more complex, long lived applications where you might want to unit test or that could have more complex logic that benefits from breaking down into smaller components shared around, using blocks are a bit of a mess to share the DbContext, so an injected DbContext scoped to the request serves that purpose just fine. Every class instance serving a request is given the exact same DbContext instance.
You don't want a DbContext scoped longer than a request (I.e. Singleton) because while requests from one client may be sequential, requests from multiple users are not. Web servers will respond to various user requests at a time on different threads. EF's DbContext is not thread safe. This catches out new developers where everything seems to work on their machine when testing, only to find that once deployed to a server and handling concurrent requests, errors start popping up.
Also, as DbContext's age, they get bigger and slower tracking more instances of entities. This leads to gradual performance loss, as well as issues as a DbContext serves up cached instances that doesn't reflect data changes from possibly other sources. A new development team might get caught out with the cross-thread issue but introduce locking or such because they want to use EF's caching rather than using a shorter lifespan. (assuming DbContext are "expensive" to create all the time [they're not!:]) This often is the cause of teams calling to abandon EF because it's "slow" without realizing that design decisions prevented them from taking advantage of most of EF's capabilities.
As a general tip I would strongly recommend avoiding the Generic Repository pattern when working with EF. It will give you no benefit other than pigeon-holing your data logic. The power of EF is in the ability to handle the translation of operations against Objects and their relationships down to SQL. It is not merely a wrapper to get down to data. Methods like this:
public IEnumerable<TEntity> Get()
{
return _context.Set<TEntity>().ToList();
}
are entirely counter-productive. If you have tens of thousands of records want to order and paginate, and do something like:
var items = repository.Get()
.OrderBy(x => x.CreatedAt)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
The problem is that your repository tells EF to load, track, and materialize the entire table before any sorting or pagination take place. What's worse is that if there was any filtering to be done (Where clauses based on search criteria etc.) then these wouldn't be applied until the Repository had returned all of the records.
Instead, if you just had your controller method do this:
var items = _context.PageGroups
.OrderBy(x => x.CreatedAt)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
then EF would compose an SQL query that performed the ordering and fetched just that single page of entities. The same goes for taking advantage of Projection with Select to fetch back just the details you need, or eager loading related entities. Trying to do that with a generic repository gets either very complex (trying to pass expressions around, or lots of arguments to try and handle sorting, pagination, etc.) or very inefficient, often both.
Two reasons I recommend considering a repository are: Unit testing, and to handle low-level common filtering such as soft-delete (IsActive) and/or multi-tenancy (OwnerId) type data. Basically any time that the data generally has to conform to standard rules that a repository can enforce in one place. In these cases I recommend non-generic repositories that serve respective controllers. For instance, if I have a ManagePageGroupsController, I'd have a ManagePageGroupsRepository to serve it. The key difference in this pattern is that the Repository returns IQueryable<TEntity> rather than IEnumerable<TEntity> or even TEntity. (Unless the result of a "Create" method) This allows the consumers to still handle sorting, pagination, projection, etc. as if they were working with the DbContext, while the repository can ensure Where clauses are in place for low-level rules, assert access rights, and the repository can be mocked out easily as a substitute for unit tests. (Easier to mock a repository method that serves an IQueryable than to mock a DbContext/DbSet) Unless your application is going to be using unit tests, or has a few low-level common considerations like soft-deletes, I'd recommend not bothering with the complexity of trying to abstract the DbContext and fully leverage everything EF has to offer.
Edit: Expanding on IQueryable
Once you determine that a Repository serves a use for testing or base filtering like IsActive, you can avoid a lot of complexity by returning IQueryable rather than IEnumerable.
Consumers of a repository will often want to do things like filter results, sort results, paginate results, project results to DTOs / ViewModels, or otherwise use the results to perform checks like getting a count or checking if any items exist.
As covered above, a method like:
public IEnumerable<PageGroup> Get()
{
return _context.PageGroups
.Where(x => x.IsActive)
.ToList();
}
would return ALL items from the database to be stored in memory by the application server before any of these considerations were taken. If we want to support filtering:
public IEnumerable<PageGroup> Get(PageGroupFilters filters)
{
var query _context.PageGroups
.Where(x => x.IsActive);
if (!string.IsNullOrEmpty(filters.Name)
query = query.Where(x => x.Name.StartsWith(filters.Name));
// Repeat for any other supported filters.
return query.ToList();
}
Then adding order by conditions:
public IEnumerable<PageGroup> Get(PageGroupFilters filters, IEnumerable<OrderByCondition> orderBy)
{
var query _context.PageGroups
.Where(x => x.IsActive);
if (!string.IsNullOrEmpty(filters.Name)
query = query.Where(x => x.Name.StartsWith(filters.Name));
// Repeat for any other supported filters.
foreach(var condition in orderBy)
{
if (condition.Direction == Directions.Ascending)
query = query.OrderBy(condition.Expression);
else
query = query.OrderByDescending(condition.Expression);
}
return query.ToList();
}
then pagination:
public IEnumerable Get(PageGroupFilters filters, IEnumerable orderBy, int pageNumber = 1, int pageSize = 0)
{
var query _context.PageGroups
.Where(x => x.IsActive);
if (!string.IsNullOrEmpty(filters.Name)
query = query.Where(x => x.Name.StartsWith(filters.Name));
// Repeat for any other supported filters.
foreach(var condition in orderBy)
{
if (condition.Direction == Directions.Ascending)
query = query.OrderBy(condition.Expression);
else
query = query.OrderByDescending(condition.Expression);
}
if (pageSize != 0)
query = query.Skip(pageNumber * pageSize).Take(pageSize);
return query.ToList();
}
You can hopefully see where this is going. You may just want a count of applicable entities, or check if at least one exists. As above this will still always return the list of Entities. If we have related entities that might need to be eager loaded, or projected down to a DTO/ViewModel, still much more work to be done or a memory/performance hit to accept.
Alternatively you can add multiple methods to handle scenarios for filtering (GetAll vs. GetBySource, etc.) and pass Func<Expression<T>> as parameters to try and generalize the implementation. This adds considerable complexity or leaves gaps in what is available for consumers. Often the justification for the Repository pattern is to abstract the data logic (ORM) from the business logic. However this either cripples your performance and/or capability of your system, or it is a lie the minute you introduce Expressions through the abstraction. Any expression passed to the repository and fed to EF must conform to EF's rules (No custom functions, or system methods that EF cannot translate to SQL, etc.) or you must add considerable complexity to parse and translate expressions within your Repository to ensure everything will work. And then on top of that, supporting synchronous vs. asynchronous.. It adds up fast.
The alternative is IQueryable:
public IQueryable<PageGroup> Get()
{
return _context.PageGroups
.Where(x => x.IsActive);
}
Now when a consumer wants to add filtering, sorting, and pagination:
var pageGroups = Repository.Get()
.Where(x => x.Name.StartsWith(searchText)
.OrderBy(x => x.Name)
.Skip(pageNumber * pageSize).Take(pageSize)
.ToList();
if they want to simply get a count:
var pageGroups = Repository.Get()
.Where(x => x.Name.StartsWith(searchText)
.Count();
If we are dealing with a more complex entity like a Customer with Orders and OrderLines, we can eager load or project:
// Top 50 customers by order count.
var customer = ManageCustomerRepository.Get()
.Select(x => new CustomerSummaryViewModel
{
CustomerId = x.Id,
Name = x.Name,
OrderCount = x.Orders.Count()
}).OrderByDescending(x => x.Orders.Count())
.Take(50)
.ToList();
Even if I commonly fetch items by ID and want a repository method like "GetById" I will return IQueryable<T> rather than T:
public IQueryable<PageGroup> GetById(pageGroupid)
{
return _context.PageGroups
.Where(x => x.PageGroupId == pageGroupId);
// rather than returning a PageGroup and using
// return _context.PageGroups.SingleOrDefault(x =>x.PageGroupId == pageGroupid);
}
Why? Because my caller can still take advantage of projecting the item down to a view model, decide if anything needs to be eager loaded, or do an action like an exists check using Any().
The Repository does not abstract the DbContext to hide EF from the business logic, but rather to enable a base set of rules like the check for IsActive so we don't have to worry about adding .Where(x => x.IsActive) everywhere and the consequences of forgetting it. It's also easy to mock out. For instance to create a mock of our repository's Get method:
var mockRepository = new Mock<PageGroupRepository>();
mockRepository.Setup(x => x.Get())
.Returns(buildSamplePageGroups());
where the buildSamplePageGroups method holds code that builds the set of test data suitable for the test. That method returns a List<PageGroup> containing the test data. This only gets a bit more complex from a testing perspective if you need to support async operations against the repository. This requires a suitable container for the test data rather than List<T>.
Edit 2: Generic Repositories.
The issue with Generic repositories is that you end up compartmentalizing your entities where through details like navigation properties, they are related. In creating an order you deal with customers, addresses, orders, products etc. where the act of creating an order generally only needs a subset of information about these entities. If I have a ManageOrdersController to handle editing and creating orders and generic repositories, I end up with dependencies on several repositories for Order, Customer, Product, etc. etc.
The typical argument for generic repositories is Single Reponsibility Principle (SRP) and Do Not Repeat Yourself (DNRY/DRY) An OrderRepository is responsible for only orders, CustomerRepository is responsible for only customers. However, you could equally argue organizing the repository this way breaks SRP because the principle behind SRP is that the code within should have one, and only one reason to change. Especially without an IQueryable implementation, a repository referenced exposing methods that are used by several different controllers and related services has the potential for many reasons to change as each controller has different concerns for the actions and output of the repository. DRY is a different argument and comes down to preference. The key to DRY is that it should be considered where code is identical, not merely similar. With an IQueryable implementation there is a valid argument that you could easily have identical methods in multiple repositories, I.e. GetProducts in a ManageOrderRepository and ManageProductsRepository vs. centralizing it in a ProductsRepository referenced by both ManageOrderController and ManageProductController. However, the implementation of GetProducts is fairly dead simple, amounting to nearly a one-liner. A GetProducts method for a Product-related controller may be interested on getting products that are active vs. inactive, where getting products to complete an order would likely only ever look at active products. It boils down to a decision if trying to satisfy DRY is worth having to manage references to a handful (or more) repository dependencies vs. a single repository. (Considering things like mock setups for tests) Generic repositories specifically expect all methods across every entity type to conform to a specific pattern. Generics are great where that implementation is identical, but fails at that goal the minute the code could benefit from being allowed to be "similar" but serve a unique variation.
Instead, I opt to pair my repository to the controller, having a ManageOrdersRepository. This repository and the methods within have only one reason to ever change, and that is to serve the ManageOrdersController. While other repositories may have similar needs from some of the entities this repository does, they are free to change to serve the needs of their controller without impacting the Manage Orders process flow. This keeps constructor dependencies compact and easy to mock.

Related

Is it okay to use .ToList() to bypass DbContext tracking?

I would like to keep the lifespan of my DbContext as short as possible, so I take advantage of the using statement.
However, since Entity Framework tracks the entities, I cannot do something like this :
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons;
}
}
Because as soon as I use the list of Persons in my ViewModel, I will receive an InvalidOperationException (the context has been disposed before I use the object).
To bypass that, I convert the result of the query to a concrete List, by calling ToList()
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons.ToList();
}
}
Is this an okay thing to do ? AsNoTracking() doesn't have any effects because it still returns some entity that gets dropped if the context is disposed.

When it comes to Entities you have two options. One is simple, and the other looks even simpler but is a whole lot more complex.
The simple option: An entity should never be referenced beyond the scope of it's DBContext. This means in something like an MVC app, entities don't get sent to a view. They don't get serialized, they are consumed solely within the scope of their DbContext.
The key to this approach is using POCO view models and projection, (Select and Automapper's ProjectTo) and when dealing with various common layers and such, adopting something like a Unit of Work pattern to manage the lifetime scope of a DbContext.
The deceptively simple option: Detached entities. Simply ensuring the DbContext is alive when returning entities can be enough to solve your problem. A Unit of Work pattern can be leveraged for this approach as well. While it looks simple enough, it is loaded with pitfalls. Serializing an entity for instance to pass back to a view will trigger lazy load calls as the serializer traverses the entity(ies). Passing entities back to be persisted is also problematic as you have to be cautious of trying to re-attach entities where the Context may already be tracking leading to situational runtime errors, overwriting data with stale copies, and exposing your system to unintended tampering if overly trusting the passed in entity. From a performance standpoint, this is also the worst option as you will be serializing and transmitting entire entity graphs back and forth rather than just the data a view needs.
Fixing your problem with option 2 is generally as easy as Eager Loading anything the calling code might touch. For example, if an Person has a reference to an Address:
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons
.Include(x => x.Address)
.ToList();
}
}
However, here is where the complexity creeps in. This requires the method to know about how the Persons might be consumed. Person might have references to five or more other objects/sets, and those objects may have references to more. (I.e. Address having a reference to a Country entity, or AddressType, etc.) The catch-all fix ends up being to eager load everything, which results in a lot of unnecessary data and is prone to bugs as entities evolve and code grows to rely on different bits. Then when performance becomes a problem you start diving down a rabbit hole in trying to support the caller telling this method what to eager load. As the system grows and you want to support pagination, sorting, etc. the method becomes more and more complex or it becomes slower and slower.
The solution I advocate for is to ensure that entities never cross the boundary of their DbContext, and leveraging Linq/EF's IQueryable implementation in combination with a unit of work so that consumers of these repository or service methods can be responsible for the scope of the DbContext. For example:
private AppDbContext Context
{
get { return AmbientDbContextLocator.Get<AppDbContext>(); }
}
public IQueryable<Person> GetPersons()
{
Logging.Log(Information, "Requested getPersons from service");
return Context.Persons.AsQueryable();
}
Then in calling code:
var mapperConfig = new MapperConfiguration(cfg =>
{
cfg.CreateMap<Person, PersonViewModel>(); // Can include any relevant mapping to flatten needed fields...
});
using( var contextScope = ContextScopeFactory.Create())
{
var viewModels = PersonRepository.GetPersons()
.ProjectTo<PersonViewModel>(mapperConfig)
.ToList();
return View(viewModel);
}
Often you will have low-level rules such as defaulting to only returning Active rows. .AsQueryable() is only needed if you have no rules. Anything that results in a .Where or such inside the repository/service method returns IQueryable so it could return:
return Context.Persons.Where(x => x.IsActive);
Repositories can enforce low level filtering rules like IsActive, authorization checks, etc. leaving more variable filtering, sorting, etc. up to the consumers.
The advantage of this approach is that the consumer (Controller, etc.) has full control over how the data is consumed without introducing any complexity/business logic into the repository/service. If I want to support sorting and pagination:
var viewModels = PersonRepository.GetPersons()
.OrderBy(x => x.Age)
.ProjectTo<PersonViewModel>(mapperConfig)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
The key point is that the entity (Person) doesn't leave the scope of the unit of work / DbContext, it is projected into a ViewModel which holds no references to entities, only data. That model can be safely serialized and represents only the data that the view needs. For projection I've used Automappper's ProjectTo as an example. You can use Linq's Select as a more manual option. The unit of work pattern I use and have outlined above is Mehdime's DbContextScope.

If you just return db.Persons which is an IEnumerable<Person> to some caller and then dispose the db context before that caller can iterate the result, you will get the exception, due to the lazy nature by default of entity framework. ToList gets around this by forcing iteration immediately before the context is disposed, but still has overhead for object tracking, so combining ToList with AsNoTracking gets you the best performance.
AsNoTracking removes some internal tracking code inside the db context, it does not allow you to pass out the IEnumerable<Person> after disposing the context, you still need ToList for that.
Use the extension method AsNoTracking and then call ToList:
db.Persons.AsNoTracking().ToList();
Or you can disable change tracking on the change tracker property of the db context to make the whole thing read only, which avoids having to call AsNoTracking everywhere. You still need ToList with this approach.
/// <summary>
/// Disable all change tracking - place in db context sub class
/// </summary>
public void DisableChangeTracking()
{
ChangeTracker.AutoDetectChangesEnabled = false;
ChangeTracker.LazyLoadingEnabled = false;
ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
}

It is acceptable to do so. The problem with IEnumerable is that Entity Framework supports deferred execution, so they won't be populated until you fetch them to a concrete collection (i.e. .ToList()), which could happen after it's been disposed of.
Your 2 possibilities are:
Executing the work on the deferred object by fetching it to a concrete collection (.ToList()).
Using dependency injection with a scoped Lifetime to let it live until the request completes.
Both are correct, and which one to prefer depends on your use-case.

Is DbSet<>.Local something to use with special care?

For a few days now, I have been struggling with retrieving my entities from a repository (DbContext).
I am trying to save all the entities in an atomic action. Thus, different entities together represent something of value to me. If all the entities are 'valid', then I can save them all to the database. Entity 'a' is already stored in my repository, and needs to be retrieved to 'validate' entity 'b'.
That's where the problem arises. My repository relies on the DbSet<TEntity> class which works great with Linq2Sql (Include() navigation properties e.g.). But, the DbSet<TEntity> does not contain entities that are in the 'added' state.
So I have (as far as I know) two options:
Use the ChangeTracker to see which entities are available and query them into a set based on their EntityState.
Use the DbSet<TEntity>.Local property.
The ChangeTracker seems to involve some extra hard work to get it working in a way such that I can use Linq2Sql to Include() navigation properties e.g.
The DbSet<TEntity>.Local seems a bit weird to me. It might just be the name. I just read something that it is not performing very well (slower than DbSet<> itself). Not sure if that is a false statement.
Could somebody with significant EntityFramework experience shine some light on this? What's the 'wise' path to follow? Or am I seeing ghosts and should I always use the .Local property?
Update with code examples:
An example of what goes wrong
public void AddAndRetrieveUncommittedTenant()
{
_tenantRepository = new TenantRepository(new TenantApplicationTestContext());
const string tenantName = "testtenant";
// Create the tenant, but not call `SaveChanges` yet until all entities are validated
_tenantRepository.Create(tenantName);
//
// Some other code
//
var tenant = _tenantRepository.GetTenants().FirstOrDefault(entity => entity.Name.Equals(tenantName));
// The tenant will be null, because I did not call save changes yet,
// and the implementation of the Repository uses a DbSet<TEntity>
// instead of the DbSet<TEntity>.Local.
Assert.IsNotNull(tenant);
// Can I safely use DbSet<TEntity>.Local ? Or should I play
// around with DbContext.ChangeTracker instead?
}
An example of how I want to use my Repository
In my Repository I have this method:
public IQueryable<TEntity> GetAll()
{
return Context.Set<TEntity>().AsQueryable();
}
Which I use in business code in this fashion:
public List<Case> GetCasesForUser(User user)
{
return _repository.GetAll().
Where(#case => #case.Owner.EmailAddress.Equals(user.EmailAddress)).
Include(#case => #case.Type).
Include(#case => #case.Owner).
ToList();
}
That is mainly the reason why I prefer to stick to DbSet like variables. I need the flexibility to Include navigation properties. If I use the ChangeTracker I retrieve the entities in a List, which does not allow me to lazy load related entities at a later point in time.
If this is close to incomprehensible bullsh*t, then please let me know so that I can improve the question. I desperately need an answer.
Thx a lot in advance!

If you want to be able to 'easily' issue a query against the DbSet and have it find newly created items, then you will need to call SaveChanges() after each entity is created. If you are using a 'unit of work' style approach to working with persistent entities, this is actually not problematic because you can have the unit of work wrap all actions within the UoW as a DB transaction (i.e. create a new TransactionScope when the UoW is created, and call Commit() on it when the UoW completed). With this structure, the changes are sent to the DB, and will be visible to DbSet, but not visible to other UoWs (modulo whatever isolation level you use).
If you don't want the overhead of this, then you need to modify your code to make use of Local at appropriate times (which may involve looking at Local, and then issuing a query against the DbSet if you didn't find what you were looking for). The Find() method on DbSet can also be quite helpful in these situations. It will find an entity by primary key in either Local or the DB. So if you only need to locate items by primary key, this is pretty convenient (and has performance advantages as well).

As mentioned by Terry Coatta, the best approach if you don't want to save the records first would be checking both sources.
For example:
public Person LookupPerson(string emailAddress, DateTime effectiveDate)
{
Expression<Func<Person, bool>> criteria =
p =>
p.EmailAddress == emailAddress &&
p.EffectiveDate == effectiveDate;
return LookupPerson(_context.ObjectSet<Person>.Local.AsQueryable(), criteria) ?? // Search local
LookupPerson(_context.ObjectSet<Person>.AsQueryable(), criteria); // Search database
}
private Person LookupPerson(IQueryable<Person> source, Expression<Func<Person, bool>> predicate)
{
return source.FirstOrDefault(predicate);
}

For those who come after, I ran into some similar issues and decided to give the .Concat method a try. I have not done extensive performance testing so someone with more knowledge than I should feel free to chime in.
Essentially, in order to properly break up functionality into smaller chunks, I ended up with a situation in which I had a method that didn't know about consecutive or previous calls to that same method in the current UoW. So I did this:
var context = new MyDbContextClass();
var emp = context.Employees.Concat(context.Employees.Local).FirstOrDefault(e => e.Name.Contains("some name"));

This may only apply to EF Core, but every time you reference .Local of a DbSet, you're silently triggering change detection on the context, which can be a performance hit, depending on how complex your model is, and how many entries are currently being tracked.
If this is a concern, you'll want to use (fore EFCore) dbContext.ChangeTracker.Entries<T>() to get the locally tracked entities, which will not trigger change detection, but does require manual filtering of the DB state, as it will include deleted and detached entities.
There's a similar version of this in EF6, but in EFCore the Entries is a list of EntityEntries which you'll have to select out the entry.Entity to get out the same data the DbSet would give you.

Does filtering of data take place in the controller, service or repository layers?

I am using ASP.NET MVC 3. I get my view's data in the following sequence:
Controller -> Service Layer -> Repository
In my repository I have a GetAll method that brings back all the records for a specific object, like Category.
So if I need a list of the all the categories then in my controller I would have something like:
IEnumerable<Category> categories = categoryService.GetAll();
In the service layer I would have something like:
public IEnumerable<Category> GetAll()
{
return categoryRepository.GetAll();
}
Now this is what I need to know where do I actually start to filter the data? Can it be done anywhere in one of these 3 layers or does it only have to be in the repository layer? Lets say I need all the parent categories. Do I have the .GetAll.Where(x => x.ParentCategoryId == null); in my controller, service layer, or repository layer?
Do I have it like this in my controller:
IEnumerable<Category> categories = categoryService.GetParentCategories();
And in my service layer I can have:
public IEnumerable<Category> GetParentCategories()
{
return categoryRepository.GetAll.Where(x => x.ParentCategoryId == null);
}
Or does my service layer have to look like this:
public IEnumerable<Category> GetParentCategories()
{
return categoryRepository.GetParentCategories();
}
And then in my repository layer like this:
public IEnumerable<Category> GetParentCategories()
{
return GetAll()
.Where(x => x.ParentCategoryId == null);
}
Please can someone help clarify this confusion that I have. There might be different scenarios. I might bring back all categories that have an active status. I might bring back categories with an inactive status. Then do I need a method for each?

You should filter at the closest you can from the data source, otherwise you'll be retrieving records to upper layers that will just be discarded due to a filtering option. This does not scale well, so you need to expose filtering capabilities at all layers that require it, but make sure that the actual filtering is performed in lowest layer possible, generally it is performed at the database level.
In the example you posted if use GetAll which return an IEnumerable of all the records and only then apply the filtering you'll have problems in the future because you're basically loading an entire table into memory and only then applying a filtering.
Since you're using EF you could take advantage of the deferred execution properties of the IQueryable. Check:
.NET Entity Framework - IEnumerable VS. IQueryable
Should a Repository return IEnumerable , IQueryable or List?
Update: Following up on your comment you should also check:
LINQ to entities vs LINQ to objects - Are they the same?

You should always try to fetch as little as possible from the database. And you should therefore do all filtering in your repository classes.
Many articles suggests that you create and use generic repositories. But imho they will not work very well when your application grows. I recommend that you create proper repository classes with proper search methods like:
emailRepository.GetForUser("Ada");
userRepository.GetNewUsers();
First of all, you hide implementation details like how to identify new users. It also makes the code easier to understand and extend than using a generic query.
You can also add some filtering options:
emailRepository.GetForUser("Ada", Filtering.New().Paged(1, 20).SortedBy("FirstName"));
Unlike #JoãoAngelo I do NOT recommend that you use IQueryable outside of your repository. By doing so you'll move the database execution to outside your repository class. And that means that any errors can no be handled by your repository.

Which types should my Entity Framework repository and service layer methods return: List, IEnumerable, IQueryable?

I have a concrete repository implementation that returns a IQueryable of the entity:
public class Repository
{
private AppDbContext context;
public Repository()
{
context = new AppDbContext();
}
public IQueryable<Post> GetPosts()
{
return context.Posts;
}
}
My service layer can then perform LINQ as needed for other methods (where, paging, etc)
Right now my service layer is setup to return IEnumerable:
public IEnumerable<Post> GetPageOfPosts(int pageNumber, int pageSize)
{
Repository postRepo = new Repository();
var posts = (from p in postRepo.GetPosts() //this is IQueryable
orderby p.PostDate descending
select p)
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize);
return posts;
}
This means in my codebehind I have to do a ToList() if I want to bind to a repeater or other control.
Is this the best way to handle the return types or do I need to be converting to list before I return from my service layer methods?

Both approaches are possible and it is only matter of choice.
Once you use IQueryable you have simple repository which will work in the most cases but it is worse testable because queries defined on IQueryable are linq-to-entities. If you mock repository they are linq-to-objects in unit tests = you don't test your real implementation. You need integration tests to test your query logic.
Once you use IEnumerable you will have very complex public interfaces of your repositories - you will need special repository type for every entity which needs special query exposed on the repository. This kind of repositories was more common with stored procedures - each method on the repository was mapped to single stored procedure. This type of repository provides better separation of concerns and less leaky abstraction but in the same time it removes a lot of ORM and Linq flexibility.
For the last you can have combined approach where you have methods returning IEnumerable for most common scenarios (queries used more often) and one method exposing IQueryable for rare or complex dynamically build queries.
Edit:
As noted in comments using IQueryable has some side effects. When you expose IQueryable you must keep your context alive until you execute the query - IQueryable uses deferred execution in the same way as IEnumerable so unless you call ToList, First or other functions executing your query you still need your context alive.
The simplest way to achieve that is using disposable pattern in the repository - create context in its constructor and dispose it when repository disposes. Then you can use using blocks and execute queries inside them. This approach is for very simple scenarios where you are happy with single context per repository. More complex (and common) scenarios require context to be shared among multiple repositories. In such case you can use something like context provider / factory (disposable) and inject the factory to repository constructor (or allow provider to create repositories). This leads to DAL layer factory and custom unit of work.

The other word for your question seems to need to determine when the AppDbContext is disposed or where it is.
If you don't dispose it, meaning it's disposed when a application exits, it is no problem to return IEnumerable/IQueryable, no having actual data. However, you would need to return the type as IList, having actual data, before the AppDbContext is disposed.
UPDATE:
I think you would need to catch the following code meaning though you already know.
//outside of this code is refered to your code.
//Returning IEnumerable could be used outside this scope if AppDbContext is ensured no disposing
public IEnumerable<Post> GetIEnumerableWithoutActualData()
{
return context.Posts;
}
//Even if AppDbContext is disposed, IEnumerable could be used.
public IEnumerable<Post> GetIEnumerableWithActualData()
{
return context.Posts.ToList();
}

Your returns types should always be as high up on the inheritance hierarchy as possible (or maybe I should write that as low, if the base is towards the bottom). If all your methods require IQueryable<T>, then all the return values should surrender that type.
That said, IEnumerable<T> has a method (AsQueryable()) you can call to achieve (what I believe to be) the desired result.

Repository Pattern: How to implement a basic Repository including a predicate in C#?

I am new to repositories. I just read about implementing predicates and a Unit of Work (Fowler). I have seen repository interfaces like the following:
public interface IRepository<ET> {
ET Add( ET entity);
ET Remove( int id);
ET Get( int id);
IList<ET> Get(Expression<Func<T, bool>> predicate);
}
Of course the Unit of Work would inject a data context (Microsoft fan) to the new repository, where the Unit of Work would have a .Save() method, calling Save on all data contexts.
There's no Edit method, so I assume you can modify any Entity that pops out of the Repository then call save changes on the Unit of Work.
Is this correct? Leaky? What am I missing? Do methods of OrderBy need not ever be in a Repository? Should Paging (.Skip().Take()) somehow be implemented in the predicate?
Links to example code out there would be fantastic, especially how to implement the predicate in a repository.

if you are referring Entity Framework
i would you suggest you read this: Link
Update:
I am not a expert in repository pattern, however i do using it in my project now. a part form performance, following is the benefits that i find from this design pattern:
1, Simplify CRUD operation implementations for all entities.
with one interface:
public interface IDataRepository<T> where T : class
then you will be able to replicate others very easily and fast
public class EntityOneRepository : IDataRepository<EntityOne>
public class EntityTwoRepository : IDataRepository<EntityTwo>
2, Keeps my code dry.
some entities may have their own method for data manipulation. (i.e. store procedure)
you can extend it easily without touching other repositories.
public interface IDonationRepository : IDataRepository<Donation>
{
//method one
//method two
//....
}
for the Paging, it can be either done by Skip() and take(), or you can define your own SP in database then call it via EF4. in that case you will benefit from database sp caching as well.
Some time, keeping the code clean and logically readable is also important for a better app structure.

The repository interface you've presented is a very easy-to-use CRUD interface that can work well in many types of applications. In general, I'd rather not have paging and sorting parameters or options on my repository, instead I'd rather return an IQueryable and let callers compose those types of operations into the query (as long as you are IQueryable, a technology like EF or nHibernate can translate those operators into SQL - if you fall back to IList or IEnumerable it's all in memory operations).
Although I avoid paging and sorting I might have more specific operations on a repository to shield business logic from some details. For example, I might extend IEmployeeRepository from IRepository and add a GetManagers method, or something similar to hide the Where expression needed in the query. It all depends on the application and complexity level.
One important note on this sentence in your post:
Of course the Unit of Work would
inject a data context (Microsoft fan)
to the new repository, where the Unit
of Work would have a .Save() method,
calling Save on all data contexts.
Make sure you are using a single data context/object context inside each unit of work, because a context is essentially the underlying unit of work. If you are using multiple contexts in the same logic transaction then you'd effectively have multiple units of work.
I have a couple sample implementations in this project:
http://odetocode.com/downloads/employeetimecards.zip
The code might make more sense if you read this accompanying article:
http://msdn.microsoft.com/en-us/library/ff714955.aspx
Hope that helps,

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.