How to Improve Linq query perfomance regarding Trim()

How to Improve Linq query perfomance regarding Trim() - c#

Our company tables were created with fields with padded spaces.
I don't have access/permissions to make changes to the DB.
However, I noticed that when I create LINQ queries using the Trim() function, the performance decreases quite a bit.
A query as simple as this shows that performance decrease:
Companies
.Where(c => c.CompanyName.Equals("Apple"))
.Select(c => new {
Tick = c.Ticker.Trim(),
Address = c.Address.Trim()
});
Is there a way to change the query so that there is no loss in performance?
Or does this rest solely with my DBA?

Quick solution is to pad your company name before giving it to the query. For example, if the column is char(50):
var paddedName = "Apple".PadRight(50);
var result = Companies
.Where(c => c.CompanyName.Equals(paddedName))
.Select(c => new {
Tick = c.Ticker.Trim(),
Address = c.Address.Trim()
});
However, you should consider correcting the database to avoid further issues.

I haven't try the performance if we use "Like" statement to do first round filter, and make it .ToList(), second round only internally do equal check without call Database.
var result = (Companies
.Where(c => c.CompanyName.StartsWith("Apple"))
.Select(c => new
{
Tick = c.Ticker.Trim(),
Address = c.Address.Trim()
})).ToList();
var result1=result
.Where(c=>c.CompanyName.Trim().Equals("Apple"))
.Select(c => c);

Other than Entity Framework, linq-to-sql can sometimes switch to linq-to-objects under the hood when it encounters method calls that can't be translated to SQL. So if you do
....
.Select(c => new {
Tick = c.Ticker.TrimEnd().TrimStart(),
Address = c.Address.TrimEnd().TrimStart()
you will notice that the generated SQL no longer contains LTRIM(RTRIM()), but only the field name and that the trims are executed in client memory. Apparently, somehow the LTRIM(RTRIM()) causes a less efficient query plan (surprisingly).
Maybe only TrimEnd() suffices if there are no leading spaces.
Further, I fully agree with p.s.w.g. that you should go out of your way to try and clean up the database in stead of fixing bad data in queries. If you can't do this job, find the right persons and twist their arms.

Related

C# - LINQ Lambda expression using GroupBy - Why nested validations are so inefficient?

I had a bad day tried to improve the performance of the next query in C#, using Entity Framework (The information is stored in a SQL Server, and the structure use a Code First approach - But this does not matter at this time):
Bad performance query:
var projectDetail = await _context
.ProjectDetails
.Where(pd => projectHeaderIds.Contains(pd.IdProjectHeader))
.Include(pd => pd.Stage)
.Include(pd => pd.ProjectTaskStatus)
.GroupBy(g => new { g.IdProjectHeader, g.IdStage, g.Stage.StageName })
.Select(pd => new
{
pd.Key.IdProjectHeader,
pd.Key.IdStage,
pd.Key.StageName,
TotalTasks = pd.Count(),
MissingCriticalActivity = pd.Count(t => t.CheckTask.CriticalActivity && t.ProjectTaskStatus.Score != 100) > 0,
Score = Math.Round(pd.Average(a => a.ProjectTaskStatus.Score), 2),
LastTaskCompleted = pd.Max(p => p.CompletionDate)
}).ToListAsync();
After some hours, I figured out the problem and I was able to fix the performance (Instead to takes more than 4 minutes, now, the new query takes only 1-2 seconds):
New query
var groupTotalTasks = await _context
.ProjectDetails
.Where(pd => projectHeaderIds.Contains(pd.IdProjectHeader))
.Select(r => new
{
r.IdProjectHeader,
r.CompletionDate,
r.IdStage,
r.ProjectTaskStatus.Score,
r.CheckTask.CriticalActivity,
r.Stage.StageName
})
.GroupBy(g => new { g.IdProjectHeader, g.IdStage, g.StageName })
.Select(pd => new
{
pd.Key.IdProjectHeader,
pd.Key.IdStage,
pd.Key.StageName,
TotalTasks = pd.Count(),
MissingCriticalActivity = pd.Count(r => r.CriticalActivity && r.Score != 100) > 0,
Score = Math.Round(pd.Average(a => a.Score), 2),
LastTaskCompleted = pd.Max(p => p.CompletionDate)
}).ToListAsync();
The steps to improve the query was the following:
Avoid nested validations (Like Score, that use the MainQuery.ProjectTaskStatus.Score to calculate the average)
Avoid Include in the queries
I used a Select to only get the information that I will use after in the GroupBy.
Those changes fixed my issue, but, why?
...and, still, exists another way to improve this query?
What are the reasons specifically to use of nested validations makes the query extremely slow?
The other changes make more sense to me.

I recently read that whenever EF Core 2 ran into anything that it couldn't produce a SQL Query for, it would switch to in-memory evaluation. So the first query would basically be pulling all of your ProjectDetails out of the database, then doing all the grouping and such in your application's memory. That's probably the biggest issue you had.
Using .Include had a big impact in that case, because you were including a bunch of other data when you pulled out all those ProjectDetails. It probably has little to no impact now that you've avoided doing all that work in-memory.
They realized the error in their ways, and changed the behavior to throw an exception in cases like that starting with EF Core 3.
To avoid problems like this in the future, you can upgrade to EF Core 3, or just be really careful to ensure Entity Framework can translate everything in your query to SQL.

Retrieving property from first item with LINQ

Is there a simpler way to write this query in Linq:
var prioritiy = db.Requirements.Where(r => r.ID == rowID).Select(r => r.Priority).First();

If you mean "simpler" as in "less code", your self-answer is already the most compact:
db.Requirements.First(r => r.ID == rowID).Priority;
If you mean "simpler" as in "less database overhead", then your original version is slightly better:
db.Requirements.Where(r => r.ID == rowID).Select(r => r.Priority).First();
Why? As #IvanStoev pointed out in the comments, LINQ execution is deferred until you call a "finishing" method like First(). If you're using SQL on the backend, the second example will be translated into a SQL statement that retrieves only the Priority field from the database, whereas the first example will retrieve all fields for the matching row.
This is, IMO, firmly in the realm of unnecessary micro-optimizations, unless this code runs millions of times or the full database object has tons of columns. Unless you're doing something crazy, just use the style that you like!

Never mind. I just figured out that by applying First() initially, I return an object which contains the property I'm looking for. The code turns into:
var priority = db.Requirements.First(r => r.ID == rowID).Priority;

a safer version in Visual Studio 2015
var priority = db.Requirements.FirstOrDefault(r => r.ID == rowID)?.Priority;
of if you call that often, you can use a LookUp
var lookup = db.Requirements.ToLookup(r => r.ID, r => r.Priority);
var priority = lookup[rowID].FirstOrDefault();

What is the fastest way to sort an EF-to-Linq query?

Using Entity Framework, in theory which is faster:
// (1) sort then select/project
// in db, for entire table
var results = someQuery
.OrderBy(q => q.FieldA)
.Select(q => new { q.FieldA, q.FieldB })
.ToDictionary(q => q.FieldA, q => q.FieldB);
or
// (2) select/project then sort
// in db, on a smaller data set
var results = someQuery
.Select(q => new { q.FieldA, q.FieldB })
.OrderBy(q => q.FieldA)
.ToDictionary(q => q.FieldA, q => q.FieldB);
or
// (3) select/project then materialize then sort
// in object space
var results = someQuery
.Select(q => new { q.FieldA, q.FieldB })
.ToDictionary(q => q.FieldA, q => q.FieldB)
.OrderBy(q => q.FieldA); // -> this won't compile, but you get the question
I'm no SQL expert, but it intuitively seems that 2 is faster than 1... is that correct? And how does that compare to 3, because in my experience with EF almost everything is faster when done on the db.
PS I have no perf tools in my environment, and not sure how to test this, hence the question.

Your query is compiling and being executed at the moment you call ToDictionary, so both 1 and 2 should be the same and produce the same query: you get a SELECT FieldA, FieldB FROM table ORDER BY FieldA in both cases.
Third is different: you first execute the SQL query (without the ORDER BY clause), then your sort the returned set in-memory (data is not sorted by the DB provider, but by the client). This might be faster or slower depending on the amount of data, the server's and client's hardware, and how is your database designed (indexes, etc.), the network infrastructure, and so on.
There's no way to tell which one will be faster with the information you provided
PS: this makes no sense as a Dictionary doesn't really care about order (I don't think 3 would compile since Dictionary<>, if I'm not mistaken, doesn't have OrderBy), but change ToDictionary to ToList and there's your performance answer

What is the alternative of toList() method for performance .

I have a controller as below, and it takes too long load the data. I am using contains and tolist() methods. And i have heard about low performance of toList() method.
How can i change this approach with better coding for performance.
public List<decimal> GetOrgSolution()
{
//Need to use USER id. but we have EMPNO in session.
var Users = db.CRM_USERS.Where(c => c.ID == SessionCurrentUser.ID || RelOrgPerson.Contains(c.EMPNO.Value)).Select(c => c.ID);
//Get the organization list regarding to HR organization
var OrgList = db.CRM_SOLUTION_ORG.Where(c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value)).Select(c => c.ID).ToList();
//Get related solutions ID with the OrgList
List<decimal> SolutionList = db.CRM_SOLUTION_OWNER.Where(p => OrgList.Contains(p.SOLUTION_ORG_ID.Value)).Select(c => (decimal)c.SOLUTION_ID).Distinct().ToList();
return SolutionList;
}

You might be able to speed this up by dropping the ToList() from the orglist query. This uses deferred execution, rather than pulling all the records for the org list. However, if there is no match on the query that calls Contains(), it will still have to load everything.
public List<decimal> GetOrgSolution()
{
//Need to use USER id. but we have EMPNO in session.
var Users = db.CRM_USERS.Where(c => c.ID == SessionCurrentUser.ID || RelOrgPerson.Contains(c.EMPNO.Value)).Select(c => c.ID);
//Get the organization list regarding to HR organization
var OrgList = db.CRM_SOLUTION_ORG.Where(c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value)).Select(c => c.ID);
//Get related solutions ID with the OrgList
List<decimal> SolutionList = db.CRM_SOLUTION_OWNER.Where(p => OrgList.Contains(p.SOLUTION_ORG_ID.Value)).Select(c => (decimal)c.SOLUTION_ID).Distinct().ToList();
return SolutionList;
}

Unless the lists you're working with are really huge, it's highly unlikely that calling ToList is the major bottleneck in your code. I'd be much more inclined to suspect the database (assuming you're doing LINQ-to-SQL). Or, your embedded Contains calls. You have, for example:
db.CRM_SOLUTION_ORG..Where(
c => c.USER_ID == SessionCurrentUser.ID || Users.Contains(c.USER_ID.Value))
So for every item in db.CRM_SOLUTION_ORG that fails the test against SessionCurrentUser, you're going to do a sequential search of the Users list.
Come to think of it, because Users is lazily evaluated, you're going to execute that Users query every time you call Users.Contains. It looks like your code would be much more efficient in this case if you called ToList() on the Users. That way the query is only executed once.
And you probably should keep the ToList() on the OrgList query. Otherwise you'll be re-executing that query every time you call OrgList.Contains.
That said, if Users or OrgList could have a lot of items, then you'd be better off turning them into HashSets so that you get O(1) lookup rather than O(n) lookup.
But looking at your code, it seems like you should be able to do all of this with a single query using joins, and let the database server take care of it. I don't know enough about Linq to SQL or your data model to say for sure, but from where I'm standing it sure looks like a simple joining of three tables.

Linq to sql expression tree execution zone issue

I have got a bit of an issue and was wondering if there is a way to have my cake and eat it.
Currently I have a Repository and Query style pattern for how I am using Linq2Sql, however I have got one issue and I cannot see a nice way to solve it. Here is an example of the problem:
var someDataMapper = new SomeDataMapper();
var someDataQuery = new GetSomeDataQuery();
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.Select(x => someDataMapper.Map(x));
return results.Where(x => x.SomeMappedColumn == "SomeType");
The main bits to pay attention to here are Mapper, Query, Repository and then the final where clause. I am doing this as part of a larger refactor, and we found that there were ALOT of similar queries which were getting slightly different result sets back but then mapping them the same way to a domain specific model. So take for example getting back a tbl_car and then mapping it to a Car object. So a mapper basically takes one type and spits out another, so exactly the same as what would normally happen in the select:
// Non mapped version
select(x => new Car
{
Id = x.Id,
Name = x.Name,
Owner = x.FirstName + x.Surname
});
// Mapped version
select(x => carMapper.Map(x));
So the car mapper is more re-usable on all areas which do similar queries returning same end results but doing different bits along the way. However I keep getting the error saying that Map is not able to be converted to SQL, which is fine as I dont want it to be, however I understand that as it is in an expression tree it would try to convert it.
{"Method 'SomeData Map(SomeTable)' has no supported translation to SQL."}
Finally the object that is returned and mapped is passed further up the stack for other objects to use, which make use of Linq to SQL's composition abilities to add additional criteria to the query then finally ToList() or itterate on the data returned, however they filter based on the mapped model, not the original table model, which I believe is perfectly fine as answered in a previous question:
Linq2Sql point of retrieving data
So to sum it up, can I use my mapping pattern as shown without it trying to convert that single part to SQL?

Yes, you can. Put AsEnumerable() before the last Select:
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.AsEnumerable()
.Select(x => someDataMapper.Map(x));
Please note, however, that the second Where - the one that operates on SomeMappedColumn - will now be executed in memory and not by the database. If this last where clause significantly reduces the result set this could be a problem.
An alternate approach would be to create a method that returns the expression tree of that mapping. Something like the following should work, as long as everything happening in the mapping is convertible to SQL.
Expression<Func<EntityType, Car>> GetCarMappingExpression()
{
return new Expression<Func<EntityType, Car>>(x => new Car
{
Id = x.Id,
Name = x.Name,
Owner = x.FirstName + x.Surname
});
}
Usage would be like this:
var results = SomeRepository.HybridQuery(someDataQuery)
.Where(x => x.SomeColumn == 1 || x.SomeColumn == 2)
.OrderByDescending(x => x.SomeOtherColumn)
.Select(GetCarMappingExpression());

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to Improve Linq query perfomance regarding Trim() - c#

Related

C# - LINQ Lambda expression using GroupBy - Why nested validations are so inefficient?

Retrieving property from first item with LINQ

What is the fastest way to sort an EF-to-Linq query?

What is the alternative of toList() method for performance .

Linq to sql expression tree execution zone issue

Categories

Resources