Linq query preferences - c#

Learning a bit about Linq.
I have the following code:
(Please excuse the pathetic size of the data set)
class Program
{
static void Main(string[] args)
{
var employees = new List<Employee>
{
new Employee
{
Name = "Bill Bailey",
EmployeeCode = 12345,
Department = "Comedy Lab",
DateOfBirth = DateTime.Parse("13/01/1964"),
CurrentEmployee = true
},
new Employee
{
Name = "Boris Johnson",
EmployeeCode = 56789,
Department = "Cycling Dept.",
DateOfBirth = DateTime.Parse("19/06/1964"),
CurrentEmployee = true
},
new Employee
{
Name = "Bruce Forsyth",
EmployeeCode = 5,
Department = "Comedy Lab",
DateOfBirth = DateTime.Parse("22/03/1928"),
CurrentEmployee = false
},
new Employee
{
Name = "Gordon Brown",
EmployeeCode = 666,
Department = "Backbenches",
DateOfBirth = DateTime.Parse("20/02/1951"),
CurrentEmployee = false
},
new Employee
{
Name = "Russell Howard",
EmployeeCode = 46576,
Department = "Comedy Lab",
DateOfBirth = DateTime.Parse("23/03/1980"),
CurrentEmployee = false
}
};
Func<Employee, bool> oapCalculator = (employee => employee.DateOfBirth.AddYears(65) < DateTime.Now);
var oaps1 = employees.Where(oapCalculator);
var oaps2 = (from employee in employees
where oapCalculator(employee)
select employee);
oaps1.ToList().ForEach(employee => Console.WriteLine(employee.Name));
oaps2.ToList().ForEach(employee => Console.WriteLine(employee.Name));
Console.ReadLine();
}
class Employee
{
public string Name { get; set; }
public int EmployeeCode { get; set; }
public string Department { get; set; }
public DateTime DateOfBirth { get; set; }
public bool CurrentEmployee { get; set; }
}
}
I have a few questions:
As far as I can tell, both of the featured Linq queries are doing the same thing (black magic may be afoot).
Would they both be compiled down to the same IL?
If not, why, and which would be the most efficient given a sizable amount of data?
What is the best way to monitor Linq query efficiency? Performance timers or something built-in?
Is the lambda expression the preferred method, as it is the most concise?
In a department of lambda fearing luddites, is it worth taking the plunge and teaching 'em up or using the SQL-esque syntax?
Thanks

Re
var oaps1 = employees.Where(oapCalculator);
vs
var oaps2 = (from employee in employees
where oapCalculator(employee)
select employee);
There is a slight difference, in particular around the where oapCalculator(employee). The second query is mapped to:
var oaps2 = employees.Where(employee => oapCalculator(employee));
so this is an extra layer of delegate, and will also incur the (small) overhead of a capture-class due to the closure over the variable oapCalculator, and a dereference of this per iteration. But otherwise they are the same. In particular, the Select is trivially removed (in accordance with the spec).
In general, use whichever is clearest in any scenario. In this case, either seems fine, but you will find it easier to use .Where etc if you are regularly dealing in scenarios that involving delegates or Expressions.

I don't mean this to be snide, but sometimes it is better to try things out for yourself. Along those lines, here are some tools, and some of my own experiences.
1 and 2: Disassemble and find out! :) http://www.red-gate.com/products/reflector/
3: Profile your app. This is the answer to any perf-determining question, unless you're doing algorithm work (mathematical proofs, big-o). Profiling tools are built into VS.
4: Which do you prefer? How about your co-workers? This sounds like a statistical question, which would require a survey
5: Similar to 4, try it and find out! As you may have experienced, evangelizing new techniques to your co-workers will teach you as much as it will teach them.
I've found I've had about a 50% success rate w/ teaching general delegate/lambda usage. I made sure to come up with practical examples from my production test code, and showed how the equivalent imperative code had lots of duplication.
I tried going through the free SICP videos with my team (being a really eye-opener on refactoring), and I found it a pretty hard sell. LISP isn't the most attractive language to the majority of programmers...
http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/

Both LINQ queries are equivalent. The second uses syntactic sugar that the compiler translates to an expression similar to your first query before compiling. As far as what is preferred, use whatever seems more readable to you and your team.

Related

How do you post a many-to-many-to-many relation in a REST API?

Using EF Core
We are trying to obtain all information of an assessment, which includes its groups and all assigned users. See the
Database Diagram
What is working in following order;
HttpPost (api/Assessment/aID/groups) of an empty group to an assessment
HttpPost (api/Group/gID/users) of users to an existing group
What we are trying to accomplish (code referenced is a different example, yet same principle);
HttpPost (api/Assessment/aID/groups) where a group already contains a list of users. When trying to accomplish this, a possible object cycle was detected which is not supported.
This piece of code is currently throwing a NullReference on Address
-------------------------------------------------------------------
Group groupToCreate = new Group { Name = dto.Name, Description = dto.Description };
foreach (var u in dto.Users)
{
groupToCreate.AddUser(new User
{
Name = u.Name,
Email = u.Email,
Address = new Address
{
Country = u.Address.Country,
City = u.Address.City,
PostalCode = u.Address.PostalCode,
Street = u.Address.Street,
HouseNr = u.Address.HouseNr,
BusNr = u.Address.BusNr
}
});
}
_groupRepository.Add(groupToCreate);
_groupRepository.SaveChanges();
return groupToCreate;
HttpGet (api/Assessment) which displays its assigned groups and linked users.
This seems to be working
------------------------
groupList = _groups.Select(g => new GroupDTO
{
Name = g.Name,
Description = g.Description,
Users = g.GroupUsers.Select(u => new UserDTO
{
Name = u.User.Name,
Email = u.User.Email,
Address = new AddressDTO
{
Country = u.User.Address.Country,
City = u.User.Address.City,
PostalCode = u.User.Address.PostalCode,
Street = u.User.Address.Street,
HouseNr = u.User.Address.HouseNr,
BusNr = u.User.Address.BusNr
}
}).ToList()
}).ToList();
References:
User
Group
Assessment
AssessmentRepo
Hard to tell with the details you're providing, but I'm guessing this is due to Having two-way navigation properties? Are you using EF here?
For example, if your User has a Navigation property allowing access to the user's Group, but a Group has a collection of User objects, then each of those users would themselves have the Group expressed within them... then when trying to express this it could easily get stuck in a cycle, e.g. a user would look like:
{
"Name":"user name",
"Group":{
"Name":"group1",
"Users":[
{
"Name":"user name",
"Group":{
"Name":"group1",
"Users":{
....
}
}
}
]
}
}
.. because a User has a Group, and the Group has a list of User objects, and each one of those has a Group... etc.
This is the sort of issue that comes from mixing your Data layer and DTO objects. Change your system so the objects returned by your REST methods are new objects designed for the requirements of the API/front-end. These objects may look very similar to your DB models (at least initially) but they should not be the same objects.
Create entirely new objects which don't have any logic or navigation properties, and exist only to pass information back to API consumers. For example, a simple class to give a list of user groups and the users in those groups may be defined as:
public class UserDto
{
public string UserName { get; set; }
public IEnumerable<string> Groups { get; set; }
}
public class UserListDto
{
public IEnumerable<UserDto> Users { get; set; }
}
And then your controller action could do something like:
var users = userService.GetAllUsers();
var result = new UserListDto {
Users = users.Select(u => new UserDto{
UserName = u.Name,
Groups = u.Groups.Select(g => g.Name)
}
};
return Ok(result);
..So the thing being serialised for the response doesn't have any complicated relationships to negotiate, and more importantly a change to how you are internally storing and working with the data won't affect the external contract of your API - API consumers can continue to see exactly the same information but how you store and compile this can change drastically.
It is tempting to think "The Data I need to return is basically the same as how I store it internally, so just re-use these classes" but that's not a great idea & will only ever give problems in the long run.
To avoid having to (re-)write a lot of code to 'convert' one object into another, I'd recommend looking into something like AutoMapper as this can make that fairly easily re-usable & allow you to put all this 'Translation' stuff into one place.

DB first Entity Framework query incredibly slow

I am new to databases, and to EF. I am using EF within an ASP.NET Core MVC project. The implementation code below is from a Controller, aiming to combine data from two tables into a summary.
The database has tables: Batch, Doc.
Batch has many columns, including: int BatchId, string BatchEnd. BatchEnd is a consistently formatted DateTime, e.g. 23/09/2016 14:33:21
Doc has many columns including: string BatchId, string HardCopyDestination. Many Docs can refer to the same BatchId, but all Docs that do so have the same value for HardCopyDestination.
I want to populate the following ViewModel
public class Batch
{
public int BatchId { get; set; }
public string Time { get; set; } // from BatchEnd
public string HardCopyDestination { get; set; }
}
But my current query, below, is running dog slow. Have I implemented this correctly?
var BatchViewModels = new List<Batch>();
// this is fine
var batches = _context.BatchTable.Where(
b => b.BatchEnd.Contains(
DateTime.Now.Date.ToString("dd/MM/yyyy")));
// this bit disappears down a hole
foreach (var batch in batches)
{
var doc = _context.DocTable.FirstOrDefault(
d => d.BatchId == batch.BatchId.ToString());
if (doc != null)
{
var newBatchVM = new Batch
{
BatchId = batch.BatchId,
Time = batch.BatchEnd.Substring(whatever to get time),
HardCopyDestination = doc.HardCopyDestination
};
BatchViewModels.Add(newBatchVM);
continue;
}
}
return View(BatchViewModels);
I think you're hitting the database once per batch. If you have many batches that is expensive. You can get all documents in one go from db.
var batchDict = batches.ToDictionary(b => b.BatchId);
var documents = _context.DocTable.Where(doc => batchDict.Keys.Contains(doc.BatchId));
BatchViewModels.AddRange(documents.Select(d => new Batch
{
BatchId = d.BatchId,
Time = batchDict[d.BatchId].BatchEnd.TimeOfDay, // you only want the time?
HardCopyDestination = d.HardCopyDestination
});
By the way, Igor is right about dates and in addition, if BatchId is int in BatchTable, then it should be that in DocTable as well. In above code I assume they are same type but shouldn't be so hard to change if they aren't.
Igor is also right about profiling db is a good way to see what the problem is. I'm just taking a guess based on your code.

Using First() get the 2nd item of LINQ result?

I'm new to Linq and EntityFramework. This is a sample program I met while learning them.
The data in table is like this:
BlogId Title
1 Hello Blog
2 New Blog
3 New Blog
I have the following Linq code, trying to read the first blog id(expected to be 2):
var name = "New Blog";
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b);//.ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
The result comes out to be 3.
Then I use ToList():
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b).ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
The result comes out to be 2.
Can anyone help to explain this? Or is this a bug?
Thanks.
//////////////////////// UPDATE /////////////////////////////
I just deleted the data in the database and inserted some new items. Now the table is like this:
BlogId Title
5 New Blog
6 New Blog
7 New Blog
8 New Blog
Then I ran the program above(Not with ToList()), the First() method returns the id 6
So I assume the method always returns the 2nd item in the situation above. And it doesn't seem to have anything to do with the RDBMS. Can anyone explain?
Thanks.
/////////////////////////////////////////////////////
FYI, the following is the whole .cs file:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.Entity;
using System.ComponentModel.DataAnnotations;
namespace SampleNew
{
class Program
{
public class Blog
{
[Key]
public Int32 BlogId { get; set; }
public String Title { get; set; }
public virtual List<Post> Posts { get; set; }
}
public class Post
{
[Key]
public Int32 PostId { get; set; }
public String Title{ get; set; }
public String Content { get; set; }
}
public class BlogContext : DbContext
{
public DbSet<Blog> Blogs{ get; set; }
public DbSet<Post> Posts { get; set; }
}
static void Main(string[] args)
{
using (var db = new BlogContext())
{
// Create and save a new Blog
// Console.Write("Enter a name for a new Blog: ");
var name = "New Blog";
//var blog = new Blog { Title = name };
var blogs = (from b in db.Blogs
where b.Title == name
orderby b.Title
select b).ToList();
Console.Write("The first id: ");
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.Count());
Blog blog = null;
foreach (Blog b in blogs)
{
blog = b;
Console.WriteLine(blog.BlogId);
}
Console.WriteLine(blog.BlogId);
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.First().BlogId);
Console.WriteLine(blogs.Last().BlogId);
Console.WriteLine(blogs.Last().BlogId);
blog.Posts = new List<Post>();
var post = new Post { Content = "Test Content2", Title = "Test Title2"};
blog.Posts.Add(post);
db.Posts.Add(post);
db.SaveChanges();
// Display all Blogs from the database
var query = from b in db.Blogs
orderby b.Title
select b;
Console.WriteLine("All blogs in the database:");
foreach (var item in query)
{
Console.WriteLine(item.Title);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}
}
}
}
You've got two identical titles there, but with different IDs. Your RDBMS has the flexibility of returning the rows that correspond to your 'New Blog' in any order that it wishes, because your code does not specify anything beyond the requirement to order by the title. Moreover, it is not even required to return results in the same order each time that you run the same query.
If you would like predictable results, add a "then by" to your LINQ statement to force the ordering that you wish to have:
var query = from b in db.Blogs
orderby b.Title, b.BlogId
select b;
EDIT :
When I ran the program above, the First() method returns the id 6 so I assume the method always returns the 2nd item in the situation above. And it doesn't seem to have anything to do with the RDBMS. Can anyone explain?
That's also happening in RDBMS, and it is reproducible without LINQ. Here is a small demo (link to sqlfiddle):
create table blogs(blogid int,title varchar(20));
insert into blogs(blogid,title) values (5,'New blog');
insert into blogs(blogid,title) values (6,'New blog');
insert into blogs(blogid,title) values (7,'New blog');
insert into blogs(blogid,title) values (8,'New blog');
SELECT * FROM Blogs ORDER BY Title
This query produces results in "natural" order:
BLOGID TITLE
------ --------
5 New blog
6 New blog
7 New blog
8 New blog
However, this query, which is what EF runs to get the First() item in RDBMS
SELECT TOP 1 * FROM Blogs ORDER BY Title
returns the second row in natural order:
BLOGID TITLE
------ --------
6 New blog
It does not mean that it is going to return the same row in other RDBMSs (link to a demo with MySQL returning a different row for the same query), or even in the same RDBMS. It simply demonstrates that LINQ relies on RDBMS for the selection of the row, and the RDBMS returns an arbitrarily selected row.
I suspect the difference comes in the optimizations that are taken by First() without the ToList().
When you call ToList(), the entire ordered list must be created. So it will order everything using an efficient sort algorithm.
However, with First(), it only needs to find the min value. So it can use a much more effecient algorithm that basically goes through the enumerable once and stores the current min object value. (So it will result in the first object of the min value).
This is a different algorithm then sorting the entire list and hence gets a different result.
Update:
Also, this being a database, it may be using linq to sql which will produce a different query based on the above description (getting a sorted list vs getting the first with the min value).

Linq to SQL creating extra sub select when doing join

I have a simple parent child relationship that I would like to load with LINQ to SQL. I want to load the children at the same time as the parent. The generated SQL is doing too much work. It is trying to count the children as well as join to them. I will not update these objects. I will not add children to the parent. I'm only interested in reading it. I have simplified the tables down to the bare minimum. In reality I have more columns. LINQ to SQL is generating the following SQL
SELECT [t0].[parentId] AS [Id], [t0].[name], [t1].[childId] AS [Id2],
[t1].[parentId], [t1].[name] AS [name2],
( SELECT COUNT(*)
FROM [dbo].[linqchild] AS [t2]
WHERE [t2].[parentId] = [t0].[parentId]
) AS [value]
FROM [dbo].[linqparent] AS [t0]
LEFT OUTER JOIN [dbo].[linqchild] AS [t1] ON [t1].[parentId] = [t0].[parentId]
ORDER BY [t0].[parentId], [t1].[childId]
I don't know why the SELECT COUNT(*) ... is there. I'd rather it went away. Both the parent and child tables will have millions of rows in them in production. The extra query is costing a great deal of time. It seems unecessary. Is there a way to make it go away? I'm also not sure where the ORDER BY is coming from either.
The classes look like this.
[Table(Name = "dbo.linqparent")]
public class LinqParent
{
[Column(Name = "parentId", AutoSync = AutoSync.OnInsert, IsPrimaryKey = true, IsDbGenerated = true, CanBeNull = false)]
public long Id { get; set; }
[ Column( Name = "name", CanBeNull = false ) ]
public string name { get; set; }
[Association(OtherKey = "parentId", ThisKey = "Id", IsForeignKey = true)]
public IEnumerable<LinqChild> Kids { get; set; }
}
[Table(Name = "dbo.linqchild")]
public class LinqChild
{
[Column(Name = "childId", AutoSync = AutoSync.OnInsert, IsPrimaryKey = true, IsDbGenerated = true, CanBeNull = false)]
public long Id { get; set; }
[ Column( Name = "parentId", CanBeNull = false ) ]
public long parentId { get; set; }
[Column(Name = "name", CanBeNull = false)]
public string name { get; set; }
}
I'm using something like the following to query, there would be a where clause in production and an index that matches.
using (DataContext context = new DataContext(new DatabaseStringFinder().ConnectionString, new AttributeMappingSource()) { ObjectTrackingEnabled = false, DeferredLoadingEnabled = false })
{
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<LinqParent>(f => f.Kids);
context.LoadOptions = loadOptions;
var table = context.GetTable<LinqParent>();
context.Log = Console.Out;
// do something with table.
}
Unfortunately, no. ORM's are never the most performant solution; you'll always get better performance if you write your own SQL (or use stored procedures), but that's the tradeoff that gets made.
What you're seeing is standard practice with ORM's; rather than using a multiple result query (which seems to me to be the most efficient way, but I'm not an ORM library author), the ORM will flatten the entire graph into a single query and bring back all of the information it needs--including information that helps it determine what bits of data are duplicated--to rebuild the graph.
This is also where the ORDER BY comes from, as it requires that linked entities be in contiguous blocks.
The query that is being generated is not all that inefficient. If you look at the estimated execution plan you will see that the count(*) expense is very minimal. The order by clause should be ordering by your primary key which is probably your clustered index, so it also should have very little impact on performance.
One thing to make sure of when testing performance on your LINQ queries, is to make sure that the context.Log is not being set. Setting this to Console.Out will cause a huge performance hit.
Hope this helps.
Edit:
After looking a little closer at the execution plan, I see that even though my Count(*) was just a clustered index scan, it was still 33% of my execution, so I agree it is kind of annoying to have this extra sub-select in the sql. If this really is the performance bottle neck, then you might want to consider creating a view or stored proc to return your results.

LINQ to XML - selecting XML to a strongly typed object

I have a number of related issues but I will break the questions down into seperate posts.
My XML is <Person>.....<Skills><Skill>Resus<Skill></Skills></Person>
My code is :
var products1 = from prd in xDoc.Descendants("Person")
select new BusinessEntityLayer.Personnel
{
PayrollNo = (String)prd.Element("PayrollNumber"),
FirstName = (String)prd.Element("Name"),
LastName = (String)prd.Element("Surname"),
StreetAddress = (String)prd.Element("StreetAddress"),
Suburb = (String)prd.Element("Suburb"),
HomePhone = (String)prd.Element("HomePhone"),
MobilePhone = (String)prd.Element("MobilePhone"),
PagerNumber = (String)prd.Element("PagerNumber"),
Email = (String)prd.Element("Email"),
RecordType = (String)prd.Element("RecordType"),
Skills = (List<String>)prd.Element("Skills")
My Personnel class is strongly typed. It all works perfectly apart from the Skills collection. Skills is List<Skill> but my code won't compile with an error - XLInq.Element to Generic.List...nor can I use String[] (refactoring my business class) as I get the same result.
What strategies do people use here?
I think you should be able to do something like this:
Skills = prd.Descendants("Skill").Select(e => new Skill(e.Value)).ToList(),

Categories

Resources