Linq to DocumentDb, where clause on child - c#

In a project i'm currently working on, we have come to realise that we should not use DocumentDb collections as if they are the equivalent of a table in f.ex SQL Server. As a result, we are now persisting all of the entities, belonging to a single tenant in a single collection.
We already have lots of linq queries in our codebase which assume that each document type (aggregate root) is persisted in a dedicated collection. In an attempt to make the transition painless, i set out to refactor our data access object, so that its api continues to reason about aggregate roots, and deal with the single collection vs dedicated collections in it's implementation.
My approach is to wrap an aggregate root in an Resource<T> object, which derives from Resource and exposes a Model property as well as a Type property. I thought i would then be able to expose an IQueryable<T> to consuming code based on the following code:
return _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Select(x => x.Model);
Initial testing showed that this worked as planned and i confidently committed my changes. When doing functional testing however, we found that some queried models had all of their properties set to their default values (ie. null, 0, false, etc).
I can reproduce the problem with the following code:
var wrong = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Select(x => x.Model)
.Where(x => !x.IsDeleted)
.ToArray();
var correct = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Where(x => !x.Model.IsDeleted)
.Select(x => x.Model)
.ToArray();
The results of the above queries are not the same!!
Both queries return the same number of TModel instances.
Only the instances returned by the second example have their properties populated.
In order for my refactoring to be successful, i need wrong to be ... right :) Falling back to SQL is not an option as we value type safety of linq. Changing our approach to expose the Resource<T> objects would touch lots of code, as it requires all *.Property references to be substituted by *.Model.Property references.
It seems an issue with the linq provider that is part of the DocumentDb client.
We use Microsoft.Azure.DocumentDb version 1.4.1
Edit 2015-09-24
The generated SQL queries are:
correct: {"query":"SELECT VALUE root.Model FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}
wrong: {"query":"SELECT * FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}
Also, this issue has been reported (and picked up) on GitHub here: https://github.com/Azure/azure-documentdb-net/issues/58

This has been confirmed as a problem with the SDK. a fix has been checked in and will ship with the next SDK drop.
in the interim you can use SQL, or change where you place the WHERE clauses.

Related

Perform includes on key object, using LINQ, in a GroupBy situation

I have a relatively simple, yet somehow weirdly complicated case whereby I need to perform includes on a lengthy object graph, when I'm doing a group-by.
Here is roughly what my LINQ looks like:
var result = DbContext.ParentTable
.Where(p => [...some criteria...])
.GroupBy(p => p.Child)
.Select(p => new
{
ChildObject = p.Key,
AllTheThings = p.Sum(p => p.SomeNumericColumn),
LatestAndGreatest = p.Max(p => p.SomeDateColumn)
})
.OrderByDescending(o => o.SomeTotal)
.Take(100)
.ToHashSet();
That gives me a listing of anonymous objects, just the way I want it, with child objects neatly associated with some aggregate stats about said object. Fine. But I also need a fair share of the object graph associated with child object.
This ask gets even a bit messier than it might otherwise be, when I want to use existing code, I already have to perform the includes. I.e., I have a static method which will take an IQueryable of my child object and, based upon parameters, give me back another IQueryable, with all the proper includes that I need (there are rather a lot of them).
I can't seem to figure the correct way to take my child object as a queryable, and give that to my include method, such that I get it back, for expansion at the point I want to express it to the new anonymous object (where I'm saying ChildObject = n.Key).
Sorry if this is something of a duplicate -- I did search around and found solutions that were close to what I'm wanting, here but not quite.

LINQ Include slowing down performance when searching

We have the following method that allows us to search a table of Projects for a DataGrid:
public async Task<IEnumerable<Project>> GetFilteredProjects(string searchString)
{
var projects = _context.Projects.Where(p => p.Current);
projects.Include(p => p.Client);
projects.Include(p => p.Architect);
projects.Include(p => p.ProjectManager);
if (!string.IsNullOrEmpty(searchString))
{
projects = projects
.Where(p => p.NormalizedFullProjectName.Contains(searchString)
|| p.Client.NormalizedName.Contains(searchString)
|| p.Architect.NormalizedFullName.Contains(searchString)
|| p.ProjectManager.NormalizedFullName.Contains(searchString));
}
projects = projects.OrderBy(p => p.Name).Take(10);
return await projects.ToListAsync();
}
If we do not use the Include on the projects then the searching is instantaneous. However, after adding them in the search can take over 3 seconds.
We need to include the other Entities to allow the Users to search on them should they want to.
How are we able to improve performance but still keep the Include to allow searching on them?
Without Incldue the method looks like so:
public async Task<IEnumerable<Project>> GetFilteredProjects(string searchString)
{
var projects = _context.Projects.Where(p => p.Current);
if (!string.IsNullOrEmpty(searchString))
{
projects = projects
.Where(p => p.Name.Contains(searchString));
}
projects = projects.OrderBy(p => p.Name).Take(10);
return await projects.ToListAsync();
}
Without Include the performance looks like so:
With Include:
The short answer is that including all the extra entities takes time and effort, thus increasing the load times.
However, there is a flaw in your assumption:
We need to include the other Entities to allow the Users to search on them should they want to.
That is not (necessarily) correct. Filtering happens on the database level. Include tells Entity Framework to load the records from the database. These are two separate things.
Look at the following examples:
_context.Projects
.Include(p => p.Architect)
.Where(p => p.Architect.Name == "Bob")
.ToList()
This will give you a list of projects (and their architects) who have an architect named Bob.
_context.Projects
.Where(p => p.Architect.Name == "Bob")
.ToList()
This will give you a list of projects (without architects) who have an architect named Bob; but it does not actually load the Architect object into memory.
_context.Projects
.Include(p => p.Architect)
.ToList()
This will give you a list of projects (and their architects). It will contain every project, the list is not filtered.
You only need to use Include when you want to do in-memory filtering, i.e. on a collection that was already loaded from the database.
Whether that is the case for you depends on this part:
projects = projects
.Where(p => p.NormalizedFullProjectName.Contains(searchString)
|| p.Client.NormalizedName.Contains(searchString)
|| p.Architect.NormalizedFullName.Contains(searchString)
|| p.ProjectManager.NormalizedFullName.Contains(searchString));
If NormalizedFullProjectName (and the other properties) are database columns, then EF is able to perform the filtering at the database level. You do not need the Include for filtering the items.
If NormalizedFullProjectName (and the other properties) are not database columns, then EF will first have to load the items in memory before it can apply the filter. In this case, you do need the Include, because the architects (and others) need to be loaded in memory.
If you are only loading the related entities for filtering purposes (not display purposes), and you are doing the filtering on the database level; then you can simply remove the include statements.
If you need those related entities to be loaded (for in-memory filtering, or for display purposes), then you can't easily remove the Include statements, unless you write a Select that specifies the fields you need.
For example:
_context.Projects
.Select(p => new { Project = p, ArchitectName = p.Architect.Name })
.ToList()
This will load the project entities (in their entirety) but only the name of the architect and none of the other properties. This can be a significant performance boost if your related entities have many properties that you currently do not need.
Note: The current example uses an anonymous type. I generally advocate creating a custom type for this; but that's unrelated to the performance issue we're addressing here.
Update
Based on your update, you seemingly imply that the intended filtering happens after the objects have been loaded from the database.
This is the source of your performance problems. You are fetching a lot of data but only show part of it. The data that does not get shown still needs to be loaded, which is wasted effort.
There are separate arguments for performance here:
Load everything once - Load all the data once (which might take a long time), but then allow the user to filter the loaded data (which is very fast)
Load chunks - Only load the data that matches the applied filters. If the user changes the filters, you load the data again. The first load will be much faster, but the subsequent filtering actions will take longer compared to in-memory filtering.
What you should do here is not my decision. It's a matter of priorities. Some customers prefer one over the other. I would say that in most cases, the second option (loading chunks) is the better option here, as it prevents needlessly loading a massive dataset if the user never looks through 90% of it. That's a waste of performance and network load.
The answer I gave applies to the "load chunks" approach.
If you decide to take the "load everything once" approach, then you will have to accept the performance hit of that initial load. The best you can do is severely limit the returned data columns (like I showed with the Select) in order to minimize the performance/network cost.
I see no reasonable argument to mix these two approaches. You'll end up with both drawbacks.

How to include optional relations in C# using Code First Entity Framework

Having the DB below I would like to retrieve all bricks in C# and include the Workshops on those BrickBacks that has any.
I managed to retrieve all the Bricks and include the BrickBacks by simply doing
context.Bricks.Include(b=>b.Back).ToList()
But in this case BrickBack is an abstract class which its subclass may contain a Workshop but this is not always the case.
Unfortunately I can't just do
context.Bricks.Include(b=>b.Back).Include(b=>b.Back.Workshop).ToList()
How can this be done?
This could work context.Bricks.Include("Back").Include("Workshop").ToList()
WorkShop will be null if Workshop_Id is null in the database.
Not possible. Maybe you can approach it from a different angle:
ConcreteBacks.Include(b => b.WorkShop)
.Include(b => b.Bricks)
.AsEnumerable()
.SelectMany(b => b.Bricks)
This will pull all ConcreteBacks and the included data from the database and then return the Bricks in a flattened list.
The AsEnumerable() is necessary because EF only includes data off the root entities in the result set. Without AsEnumerable() this would be Brick and the ConcreteBacks would be ignored. But now EF only knows about the part before AsEnumerable(), so it includes everything off ConcreteBack.

OData V3 exclude property

I wish to exclude a particular property from a entity collection over odata.
I've used .Expand("Files") to retrieve that particular collection, they are files in the database and I want all of the metadata (like Name and MimeType) of the file, but not the binary body itself.
I am not allowed to change the OData service itself so if it's possible at all, it must be done using a instruction in the odata query.
Any thoughts? Thx in advance.
UPDATE2: Vagif has been helpful, but made me realize I did not phrase my question entirely correctly. Once again: apologies. The actual property can be in the class that is returned by expanding "Files", but it must be null. In other words: I'd like the expanded child records to have a property not being filled with data.
UPDATE: Thx nemesv. I should indeed have been more specific. The odata service is build using the odata nuget package using visual studio using c#. The client uses the same tools. The server however uses odata 5.0.1. The client any version I want, (5.6 now I think).
If you use C# and LINQ on the client side, you can select the properties you want to include in the payload using Select clause and anonymous types, e.g.:
var results = context.MyCollection.Select(x => new { x.Name, x.Category });
UPDATE
The trick with selecting columns from expanded entity is that you don't need to explicitly expand it: WCF Data Services LINQ provider fill figure it out. Here's an example using Northwind data model:
[Test]
public void SelectProductAndCategoryColumns()
{
var result = _context.Products
.Where(x => x.ProductID == 1)
.Select(x => new { x.ProductID, x.Category.CategoryID, x.Category.CategoryName })
.Single();
Assert.AreEqual(1, result.ProductID);
Assert.AreNotEqual(0, result.CategoryID);
Assert.IsNotNull(0, result.CategoryName);
}
Note that I am chaining expanded columns in the Select clause without using Expand clause. And it works this way (but only for one level, you can't chain them further).

EntityFramework 5 filter an included navigation property

I would like to find a way using Linq to filter a navigation property to a subset of related entities. I know all answers around this subject suggest doing an anonymous selector such as:
query.Where(x => x.Users.Any(y => y.ID == actingUser.ID))
.Select(x => new
{
Event = x,
Discussions = x.Discussions.Where(actingUser.GenerateSecurityFilterFor<Domain.Discussion>())
})
.OrderBy(x => x.Discussions.Count())
.ThenBy(x => x.Event.Name);
However, this is significantly less than ideal due to the general nature of our query generation and also yields significantly horrific sql queries if you throw up profiler.
I would like to be able to accomplish something like:
query.Include(x => x.Discussions.Where(actingUser.GenerateSecurityFilterFor<Domain.Discussion>()))
.OrderBy(x => x.Discussions.Count())
.ThenBy(x => x.Name);
I realize that this is not supported in EF5 (or any version for that matter) but there has to be a way to accomplish constraining the result set through Linq without delving into anonymous type select statements.
I have attempted doing something to the tune of:
query.GroupJoin(discquqery,
x => x.ID,
x => x.Event.ID,
(evt, disc) => evt.Discussions = disc.Where(actingUser.GenerateSecurityFilterFor<Domain.Discussion>())).ToList();
However you cannot have assignment inside a lambda expression and selecting an anonymous type here causes the same dilemma that it does using the select.
I guess I cannot comprehend why EF does not provide a way (that I can find) to generate:
SELECT
--Properties
FROM Event e
LEFT OUTER JOIN Discussions d
ON e.ID = d.EventID AND --Additional constraints
WHERE
--Where conditions
ORDER BY
--Order Conditions
It is so simple to constrain the join in SQL there HAS to be a way to do it through Linq as well.
PS: I have searched stack, MSDN, experts-exchange, etc. Please realize this is not a duplicate. Anything even touching on this subject either has a cop-out "It can't be done" answer or no answer at all. Nothing is impossible... including this.
Anything even touching on this subject either has a cop-out "It can't
be done" answer or no answer at all. Nothing is impossible...
including this.
Sure. It is possible. You can download EF source code and add this feature yourselves. It will be great contribution to open source project and the community. I believe EF team will gladly help you with your effort.
With the current version "it can't be done" is the answer. You can either use projection to anonymous or special unmapped type as you have described in the beginning of your question. Other options are separate explicit query to load related entities for single parent or separate query to load related entities for all parents.
Load relations for single parent:
context.Entry(event)
.Collection(e => e.Discussions)
.Query()
.Where(d => ...)
.Load();
Load relations for all parents (requires lazy loading to be turned off):
// load all parents
var events = query.Where(e => ...).ToList();
// load child filtered by same condition for parents and new condition for children
childQuery.Where(d => e.Event ... && d.Something ...).Load();
The second solution requires child to have navigation property back to parent (for constructing same query condition used initially to loads parent). If you have everything correctly configured and entities are attached EF should automatically fix your relations (collections) in parent entities (but it will not mark collection in dynamic proxy as loaded so that is the reason why you cannot use this together with lazy loading).

Categories

Resources