Out of Memory Lambda Compile versus inline delegates

Out of Memory Lambda Compile versus inline delegates - c#

Using 4.5.1 with an application that on the server side shuffles chart data with many REST requests simultaneously.
Use IQueryable to build queries. For example, I originally had the following:
var query = ctx.Respondents
.Join(
ctx.Respondents,
other => other.RespondentId,
res => res.RespondentId,
(other, res) => new ChartJoin { Respondent = res, Occasion = null, BrandVisited = null, BrandInfo = null, Party = null, Item = null }
)
. // bunch of other joins filling out the ChartJoin
.Where(x => x.Respondent.status == 1)
. // more Where clauses dynamically applied
.GroupBy(x => new CommonGroupBy { Year = (int)x.Respondent.currentVisitYear, Month = (int)x.Respondent.currentVisitMonth })
.OrderBy(x => x.Key.Year)
.ThenBy(x => x.Key.Month)
.Select(x => new AverageEaterCheque
{
Year = x.Key.Year,
Month = x.Key.Month,
AverageCheque = (double)(x.Sum(m => m.BrandVisited.DOLLAR_TOTAL) / x.Sum(m => m.BrandVisited.NUM_PAID)),
Base = x.Count(),
Days = x.Select(m => m.Respondent.visitDate).Distinct().Count()
});
To allow for dynamic grouping (via the client), the GroupBy was generated with C# expressions returning a Dictionary. The Select also had to be generated with expressions. The above Select became something like:
public static Expression<Func<IGrouping<IDictionary<string, object>, ChartJoin>, AverageEaterCheque>> GetAverageEaterChequeSelector()
{
// x =>
var ParameterType = typeof(IGrouping<IDictionary<string, object>, ChartJoin>);
var parameter = Expression.Parameter(ParameterType);
// x => x.Sum(m => m.BrandVisited.DOLLAR_TOTAL) / x.Sum(m => m.BrandVisited.NUM_PAID)
var m = Expression.Parameter(typeof(ChartJoin), "m");
var mBrandVisited = Expression.PropertyOrField(m, "BrandVisited");
PropertyInfo DollarTotalPropertyInfo = typeof(BrandVisited).GetProperty("DOLLAR_TOTAL");
PropertyInfo NumPaidPropertyInfo = typeof(BrandVisited).GetProperty("NUM_PAID");
....
return a lambda...
}
When I did a test run locally I got an Out of Memory error. Then I started reading blogs from Totin and others that Lambda compiles, expression trees in general are expensive. Had no idea it would blow my application. And I need the ability to dynamically add grouping which lead me to using Expression trees for the GroupBy and Select clauses.
Would love some pointers on how to chase down the memory offenders in my application? Have seen some people use dotMemory but would be great with some practical tips as well. Very little experience in monitoring C#, DotNet.

Since you're compiling the expression into a delegate, the operation is performed using LINQ to Objects, rather than using the IQueryable overload. This means that the entirety of the data set is being pulled into memory, and all of the processing done by the application, instead of that processing being done in the database and only the final results being sent to the application.
Apparently pulling down the entire table into memory is enough to run your application out of memory.
You need to not compile the lambda, and leave it as an expression, thus allowing the query provider to translate it into SQL, as is done with your original code.

Related

C# - LINQ Lambda expression using GroupBy - Why nested validations are so inefficient?

I had a bad day tried to improve the performance of the next query in C#, using Entity Framework (The information is stored in a SQL Server, and the structure use a Code First approach - But this does not matter at this time):
Bad performance query:
var projectDetail = await _context
.ProjectDetails
.Where(pd => projectHeaderIds.Contains(pd.IdProjectHeader))
.Include(pd => pd.Stage)
.Include(pd => pd.ProjectTaskStatus)
.GroupBy(g => new { g.IdProjectHeader, g.IdStage, g.Stage.StageName })
.Select(pd => new
{
pd.Key.IdProjectHeader,
pd.Key.IdStage,
pd.Key.StageName,
TotalTasks = pd.Count(),
MissingCriticalActivity = pd.Count(t => t.CheckTask.CriticalActivity && t.ProjectTaskStatus.Score != 100) > 0,
Score = Math.Round(pd.Average(a => a.ProjectTaskStatus.Score), 2),
LastTaskCompleted = pd.Max(p => p.CompletionDate)
}).ToListAsync();
After some hours, I figured out the problem and I was able to fix the performance (Instead to takes more than 4 minutes, now, the new query takes only 1-2 seconds):
New query
var groupTotalTasks = await _context
.ProjectDetails
.Where(pd => projectHeaderIds.Contains(pd.IdProjectHeader))
.Select(r => new
{
r.IdProjectHeader,
r.CompletionDate,
r.IdStage,
r.ProjectTaskStatus.Score,
r.CheckTask.CriticalActivity,
r.Stage.StageName
})
.GroupBy(g => new { g.IdProjectHeader, g.IdStage, g.StageName })
.Select(pd => new
{
pd.Key.IdProjectHeader,
pd.Key.IdStage,
pd.Key.StageName,
TotalTasks = pd.Count(),
MissingCriticalActivity = pd.Count(r => r.CriticalActivity && r.Score != 100) > 0,
Score = Math.Round(pd.Average(a => a.Score), 2),
LastTaskCompleted = pd.Max(p => p.CompletionDate)
}).ToListAsync();
The steps to improve the query was the following:
Avoid nested validations (Like Score, that use the MainQuery.ProjectTaskStatus.Score to calculate the average)
Avoid Include in the queries
I used a Select to only get the information that I will use after in the GroupBy.
Those changes fixed my issue, but, why?
...and, still, exists another way to improve this query?
What are the reasons specifically to use of nested validations makes the query extremely slow?
The other changes make more sense to me.

I recently read that whenever EF Core 2 ran into anything that it couldn't produce a SQL Query for, it would switch to in-memory evaluation. So the first query would basically be pulling all of your ProjectDetails out of the database, then doing all the grouping and such in your application's memory. That's probably the biggest issue you had.
Using .Include had a big impact in that case, because you were including a bunch of other data when you pulled out all those ProjectDetails. It probably has little to no impact now that you've avoided doing all that work in-memory.
They realized the error in their ways, and changed the behavior to throw an exception in cases like that starting with EF Core 3.
To avoid problems like this in the future, you can upgrade to EF Core 3, or just be really careful to ensure Entity Framework can translate everything in your query to SQL.

LINQ statement is not translatable

I have the following code containing LINQ statements:
public async Task<HashSet<long>> GetMembersRecursive(IEnumerable<long> groupIds)
{
var containsGroupId = InExpression<Group>("Id", groupIds);
var containsParentId = InExpression<RecursiveGroupModel>("ParentId", groupIds);
var groupIdsArray = groupIds as long[] ?? groupIds.ToArray();
return new HashSet<long>(await MyContext
.Groups
.Where(containsGroupId)
.Select(a => new
{
Members = MyContext
.ViewWithRecursiveGroups
.Where(containsParentId)
.SelectMany(c => c.Group.Members)
.Union(a.Members)
.Where(b => !b.User.IsActive)
})
.SelectMany(a => a.Members.Select(b => b.MemberId))
.Distinct()
.ToListAsync());
}
private static Expression<Func<T, bool>> InExpression<T>(string propertyName, IEnumerable<long> array)
{
var p = Expression.Parameter(typeof(T), "x");
var contains = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
.Single(x => x.Name == "Contains" && x.GetParameters().Length == 2)
.MakeGenericMethod(typeof(long));
var property = Expression.PropertyOrField(p, propertyName);
var body = Expression.Call(
contains
, Expression.Constant(array)
, property
);
return Expression.Lambda<Func<T, bool>>(body, p);
}
The error I receive is:
Microsoft.EntityFrameworkCore: Processing of the LINQ expression 'DbSet<RecursiveGroupModel>
.Where(b => __groupIdsArray_1
.Contains(b.ParentId))
.SelectMany(c => c.Group.GroupMembers)
.Union((MaterializeCollectionNavigation(
navigation: Navigation: Group.GroupMembers,
subquery: (NavigationExpansionExpression
Source: DbSet<GroupMember>
.Where(l0 => EF.Property<Nullable<long>>(l, "Id") != null && EF.Property<Nullable<long>>(l, "Id") == EF.Property<Nullable<long>>(l0, "GroupId1"))
PendingSelector: l0 => (NavigationTreeExpression
Value: (EntityReference: GroupMember)
Expression: l0)
)
.Where(i => EF.Property<Nullable<long>>((NavigationTreeExpression
Value: (EntityReference: Group)
Expression: l), "Id") != null && EF.Property<Nullable<long>>((NavigationTreeExpression
Value: (EntityReference: Group)
Expression: l), "Id") == EF.Property<Nullable<long>>(i, "GroupId1"))))' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.
The view:
CREATE VIEW [dbo].[View_WithRecursiveGroups] AS
WITH RecursiveGroups (GroupId, ParentId) AS
(
SELECT Id, ParentId
FROM Group
WHERE ParentId IS NOT NULL
UNION ALL
SELECT Group.Id, t.ParentId
FROM GroupTree t
JOIN Group ON t.GroupId = Group.ParentId
)
SELECT * FROM RecursiveGroups
Apologies in advance if some variable names don't match up- I had to sanitize before posting.
I understand that it cannot convert code to SQL and so it's asking me to enumerate early or rewrite so that it's translatable. I have tired rearranging the query and breaking it up into smaller queries but the SelectMany on the recursive view seems to not be possible to convert to SQL.
Is there a way to get this working in-database? Or am I going about this completely the wrong way?

As an alternative, you can use raw sql query. In Entity Framework Code, we need to define a POCO class and a DbSet for that class. In your case you will need to define some YourClass:
public DbQuery<YourClass> YourClasses { get; set; }
and code to execute:
var result = context.YourClasses.FromSql("YOURSQL_SCRIPT").ToList();
var asyncresult = await context.YourClasses.FromSql("YOURSQL_SCRIPT").ToListAsync();

Yeah, welcome to the wonderfull world of EfCore 3.1 where all you can do is "Hello world".
Your query has various "problems" because EfCore does not really do LINQ processing except for super easy cases.
.Union(a.Members)
Can not be translated to run server side and client side processing is not enabled. Your only choises are:
Force server execution for both parts (using AsEnumerable) then Union on the client. That only works if you do not use that as part of a larger statement (i.e. intersect) otherwise it is "pull all the data to the client" time and that is not good.
At the current point in time I can only advice you to throw out EfCore and use EntityFramework which - as per framework 3.1 - is again available. Or use Entity Framework Classic which is a port that runs on netstandard 2.0 and has global query filters (which are THE ONE feature of EfCore I like). At last this is what I am currently getting to because - well - "better but without any features and not working" is not cutting it for me.
Whether or not EfCore will be extended (they seem not to see it as a fix) to handle anything except the most basic LINQ statements (and sometimes even not those) is unknown at this point - a lot of the changes in 3.1 are quite discouraging.
You MAY be able to move it into views etc. - but you may find out quite fast that EfCore has even more limitations and maintaining all the views gets quite tendious, too. I run into serious problems with the fact that I can not put any condition in front of any projection even in the most simple cases. And even simple bugs get commented on "we do not feel comfortable changing the pipeline, please wait for version 5 in november". Example? https://github.com/dotnet/efcore/issues/15279.

give that if you want to convert this view to Linq...
CREATE VIEW [dbo].[View_WithRecursiveGroups] AS
WITH RecursiveGroups (GroupId, ParentId) AS
(
SELECT Id, ParentId
FROM Group
WHERE ParentId IS NOT NULL
UNION ALL
SELECT Group.Id, t.ParentId
FROM GroupTree t
JOIN Group ON t.GroupId = Group.ParentId
)
var data1 = db.Group.where(x=>x.ParentId != nul)
.Select(x=>new {x.Id, x.ParentId})
.Tolist()
var data2 = (from g in db.Groups
join gt in db.GroupTree on g.ParentId equals gt.GroupId
select new { d.Id, ParentId })
.ToList();
create a class reprocenting the data and have the query return as List of known type and
just union the two lists.
linqpad is a very useful tool for learn how to create the linq which give you the sql you want.

EF Core query from Include but no need to return in result

Okay, so this is probably from not knowing how to use EF core correctly as this is my 2nd day using but I appear to have to run .Include() in order to query against the "inner join" this method creates.
I've got this bit working and filtering down my results, but I don't want to return the include in the model as it's making a 256kb JSON response turn into a 2.8Mb JSON response.
public IEnumerable<Property> GetFeaturedProperties()
{
var properties = _db.Properties
.Include(p => p.GeoInfo)
.Include(p => p.FeaturedUntil)
.ToArray();
var featuredProperties = new List<Property>();
var DateTimeNow = DateTime.Now;
var midnightToday = DateTimeNow.AddHours(-DateTimeNow.Hour)
.AddMinutes(-DateTimeNow.Minute)
.AddSeconds(-DateTimeNow.Second-1);
for (var i = 0; i < properties.Count(); i++)
{
var property = properties[i];
if(property.FeaturedUntil.Any(p => p.FeaturedDate >= midnightToday))
featuredProperties.Add(property);
}
return featuredProperties;
}
So the offending line is .Include(p => p.FeaturedUntil). As this is an Array of dates that can be up anything from 10-1000 rows per joined row. It includes ALL data, even historical so this is really racking up data.
Can I run my query and then run something to .RemoveInclude(p => p.FeaturedUntil)?

You don't need to load the navigation properties in order to apply filtering. When you access a navigation property inside LINQ to Entities query, it's translated to the corresponding SQL construct, including JOINs. No real object/collections are involved. The whole query (with some exceptions) executes at server (database) side.
In your case, the following simple query will do the job:
public IEnumerable<Property> GetFeaturedProperties()
{
var DateTimeNow = DateTime.Now;
var midnightToday = DateTimeNow.AddHours(-DateTimeNow.Hour)
.AddMinutes(-DateTimeNow.Minute)
.AddSeconds(-DateTimeNow.Second-1);
return _db.Properties
.Include(p => p.GeoInfo) // assuming you want to return this data
.Where(p => p.FeaturedUntil.Any(p => p.FeaturedDate >= midnightToday))
.ToList();
}
For more info, see How Queries Work documentation topic.

How to do mongodb queries faster?

I have lots of queries like sample1,sample2 and sample3. There are more than 13 million records in mongodb collection. So this query getting long time. Is there any way to faster this query?
I think using IMongoQuery object to resolve this problem. Is there any better way?
Sample 1:
var collection = new MongoDbRepo().DbCollection<Model>("tblmodel");
decimal total1 = collection.FindAll()
.SelectMany(x => x.MMB.MVD)
.Where(x => x.M01.ToLower() == "try")
.Sum(x => x.M06);
Sample 2:
var collection = new MongoDbRepo().DbCollection<Model>("tblmodel");
decimal total2 = collection.FindAll().Sum(x => x.MMB.MVO.O01);
Sample 3:
var list1= collection.FindAll()
.SelectMany(x => x.MHB.VLH)
.Where(x => x.V15 > 1).ToList();
var list2= list1.GroupBy(x => new { x.H03, x.H09 })
.Select(lg =>
new
{
Prop1= lg.Key.H03,
Prop2= lg.Count(),
Prop3= lg.Sum(w => w.H09),
});

The function FindAll returns a MongoCursor. When you add LINQ extension methods on to the FindAll, all of the processing happens on the client, not the Database server. Every document is returned to the client. Ideally, you'll need to pass in a query to limit the results by using Find.
Or, you could use the AsQueryable function to better utilize LINQ expressions and the extension methods:
var results = collection.AsQueryable().Where(....);
I don't understand your data model, so I can't offer any specific suggestions as to how to add a query that would filter more of the data on the server.
You can use the SetFields chainable method after FindAll to limit the fields that are returned if you really do need to return every document to the client for processing.
You also might find that writing some of the queries using the MongoDB aggregation framework might produce similar results, without sending any data to the client (except the results). Or, possibly a Map-Reduce depending on the nature of the data.

Linq to sql expression tree execution zone issue

I have got a bit of an issue and was wondering if there is a way to have my cake and eat it.
Currently I have a Repository and Query style pattern for how I am using Linq2Sql, however I have got one issue and I cannot see a nice way to solve it. Here is an example of the problem:
var someDataMapper = new SomeDataMapper();
var someDataQuery = new GetSomeDataQuery();
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.Select(x => someDataMapper.Map(x));
return results.Where(x => x.SomeMappedColumn == "SomeType");
The main bits to pay attention to here are Mapper, Query, Repository and then the final where clause. I am doing this as part of a larger refactor, and we found that there were ALOT of similar queries which were getting slightly different result sets back but then mapping them the same way to a domain specific model. So take for example getting back a tbl_car and then mapping it to a Car object. So a mapper basically takes one type and spits out another, so exactly the same as what would normally happen in the select:
// Non mapped version
select(x => new Car
{
Id = x.Id,
Name = x.Name,
Owner = x.FirstName + x.Surname
});
// Mapped version
select(x => carMapper.Map(x));
So the car mapper is more re-usable on all areas which do similar queries returning same end results but doing different bits along the way. However I keep getting the error saying that Map is not able to be converted to SQL, which is fine as I dont want it to be, however I understand that as it is in an expression tree it would try to convert it.
{"Method 'SomeData Map(SomeTable)' has no supported translation to SQL."}
Finally the object that is returned and mapped is passed further up the stack for other objects to use, which make use of Linq to SQL's composition abilities to add additional criteria to the query then finally ToList() or itterate on the data returned, however they filter based on the mapped model, not the original table model, which I believe is perfectly fine as answered in a previous question:
Linq2Sql point of retrieving data
So to sum it up, can I use my mapping pattern as shown without it trying to convert that single part to SQL?

Yes, you can. Put AsEnumerable() before the last Select:
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.AsEnumerable()
.Select(x => someDataMapper.Map(x));
Please note, however, that the second Where - the one that operates on SomeMappedColumn - will now be executed in memory and not by the database. If this last where clause significantly reduces the result set this could be a problem.
An alternate approach would be to create a method that returns the expression tree of that mapping. Something like the following should work, as long as everything happening in the mapping is convertible to SQL.
Expression<Func<EntityType, Car>> GetCarMappingExpression()
{
return new Expression<Func<EntityType, Car>>(x => new Car
{
Id = x.Id,
Name = x.Name,
Owner = x.FirstName + x.Surname
});
}
Usage would be like this:
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.Select(GetCarMappingExpression());

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.