String intersection using EF Core, Linq / SQL Server - c#

I'm trying to find if there is any match of
array of input strings
with comma separated strings
stored inside SQL Server:
class Meeting
{
public int Id { get; set; }
public string? MeetingName { get; set; }
public string DbColumnCommaSeparated { get; set; }
}
meetingQuery.Where(x => ArrayOfInputString
.Any(y => x.DbColumnCommaSeparated.Split(",").Contains(y)))
Is it feasible to do it in an EF Core query using DbFunctions's and SQL STRING_SPLIT?

What I can suggest with this particular database design is of course based on EF Core Mapping a queryable function to a table-valued function, similar to #GertArnold suggestion. However since the built-in SqlServer SPLIT_STRING function is already TVF, it can be mapped directly thus eliminating the need of custom db function and migration.
First, we'll need a simple keyless entity type (class) to hold the result, which according to the docs is a table with record having a single string column called "value":
[Keyless]
public class StringValue
{
[Column("value")]
public string Value { get; set; }
}
Next is the function itself. It could be defined as instance method f your db context, but I find it most appropriate to be a static method of some custom class, for instance called SqlFunctions:
public static class SqlFunctions
{
[DbFunction(Name = "STRING_SPLIT", IsBuiltIn = true)]
public static IQueryable<StringValue> Split(string source, string separator)
=> throw new NotSupportedException();
}
Note that this is just a "prototype" which never is supposed to be called (hence the throw in the "implementation") and just describes the traslation of actual db function call inside the query. Also all these attributes are arbitrary and the name, built-in etc. attributes can be configured fluently. I've put them here just for clarity and simplicity, since they are enough in this case and don't need the flexibility provided by fluent API.
Finally you have to register the db function for your model, by adding the following to the OnModelCreating override:
modelBuilder.HasDbFunction(() => SqlFunctions.Split(default, default));
The HasDbFunction overload used is the simplest and typesafe way of providing the information about your method using strongly typed expression rather than reflection.
And that's it. Now you can use
var query = db.Set<Meeting>()
.Where(m => SqlFunctions.Split(m.DbColumnCommaSeparated, ",")
.Any(e => ArrayOfInputString.Contains(e.Value)));
which will be translated to something like this:
SELECT [m].[Id], [m].[DbColumnCommaSeparated], [m].[MeetingName]
FROM [Meeting] AS [m]
WHERE EXISTS (
SELECT 1
FROM STRING_SPLIT([m].[DbColumnCommaSeparated], N',') AS [s]
WHERE [s].[value] IN (N'a', N'b', N'c'))
with IN clause being different depending of the ArrayOfInputString content. I'm kind of surprised it does not get parameterized as in a "normal" Contains translation, but at least it gets translated to something which can be executed server side.
One thing to note is that you need to flip ArrayOfInputString and the split result set in the LINQ query, since there is another limitation of EF Core preventing translation of anything else but Contains method of in memory collections. Since here you are looking for intersection, it really doesn't matter which one is first, so putting the queryable first avoids that limitation.
Now that you have a solution for this db design, is it good or not. Well, this seems to be arbitrary and opinion based, but in general using normalized tables and joins is preferred, since it allows db query optimizers to use indexes and statistics when generating execution plans. Joins are very efficient since they almost always use efficient indexed scans, so in my (and most of the people) opinion you should not count them when doing the design. In this particular case though I'm not sure if normalized detail table with indexed single text value would produce better execution plan than the above (which due to the lack of information would do a full table scan evaluating the filter for each row), but it's worth trying, and I guess it won't be worse at least.
Also, all this applies to relational databases. Non relational databases can in fact contain "embedded" arrays or lists of values, which then can be used to store and process such data instead of comma separated string. In either cases, I would prefer normalized design, storing a list of values either as "embedded" or in related detail table instead of single comma separated string. But again, that's just e general opinion/preference. The single string (containing tags list for instance) is valid approach which may outperform the other for some operations, so you whatever is appropriate for you. Also note that SPLIT_STRING is not a standard db function, so in case you need to work with other database than SqlServer, you'll have a problem of finding similar function if it exists at all.

It's actually pretty easy thanks to of EF-core's smooth support for mapping database functions. In this case we need a table-valued function (TVF) that can be called in C# code.
It starts with adding a TVF to the database, which, when using migrations, requires an addition to migration code:
ALTER FUNCTION [dbo].[SplitCsv] (#string nvarchar(max))
RETURNS TABLE
AS
RETURN SELECT ROW_NUMBER() OVER (ORDER BY Item) AS ID, Item
FROM (SELECT Item = [value] FROM string_split(#string, ',')) AS items
This TVF is mapped to a simple class in EF's class model:
class CsvItem
{
public long ID { get; set; }
public string? Item { get; set; }
}
For this mapping to succeed the SQL function always returns unique ID values.
Then a method, only to be used in expressions, is added to the context:
public IQueryable<CsvItem> SplitCsv(string csv) => FromExpression(() => SplitCsv(csv));
...and added to the model:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.HasDbFunction(typeof(MyContext)
.GetMethod(nameof(SplitCsv), new[] { typeof(string) })!);
}
That's all! Now you can execute queries like these (in Linqpad, db is a context instance):
db.Meetings.Where(m => db.SplitCsv(m.DbColumnCommaSeparated).Any(i => i.Item == "c")).Dump();
var csv = "a,d,d";
db.Meetings.Where(m => db.SplitCsv(m.DbColumnCommaSeparated)
.Intersect(db.SplitCsv(csv)).Any()).Dump();
Generated SQL:
SELECT [m].[Id], [m].[DbColumnCommaSeparated], [m].[MeetingName]
FROM [Meetings] AS [m]
WHERE EXISTS (
SELECT 1
FROM [dbo].[SplitCsv]([m].[DbColumnCommaSeparated]) AS [s]
WHERE [s].[Item] = N'c')
SELECT [m].[Id], [m].[DbColumnCommaSeparated], [m].[MeetingName]
FROM [Meetings] AS [m]
WHERE EXISTS (
SELECT 1
FROM (
SELECT [s].[ID], [s].[Item]
FROM [dbo].[SplitCsv]([m].[DbColumnCommaSeparated]) AS [s]
INTERSECT
SELECT [s0].[ID], [s0].[Item]
FROM [dbo].[SplitCsv](#__csv_1) AS [s0]
) AS [t])
One remark. Although it can be done, I wouldn't promote it. It's usually much better to store such csv values as a table in the database. It's much easier in maintenance (data integrity!) and querying.

EF Core has FromSqlRaw why not using it ?
here is an extension method that would work with DbSet<T> :
public static class EntityFrameworkExtensions
{
private const string _stringSplitTemplate = "SELECT * FROM {0} t1 WHERE EXISTS (SELECT 1 FROM STRING_SPLIT(t1.{1}, ',') s WHERE s.[value] IN({2}));";
public static IQueryable<TEntity> StringSplit<TEntity, TValue>(this DbSet<TEntity> entity, Expression<Func<TEntity, TValue>> keySelector, IEnumerable<TValue> values)
where TEntity : class
{
var columnName = (keySelector.Body as MemberExpression)?.Member?.Name;
if (columnName == null) return entity;
var queryString = string.Format(_stringSplitTemplate, entity.EntityType.GetTableName(), columnName, string.Join(',', values));
return entity.FromSqlRaw(queryString);
}
}
usage :
var result = context.Meetings.StringSplit(x=> x.DbColumnCommaSeparated, ArrayOfInputString).ToList();
this would generate the following SQL :
SELECT *
FROM table t1
WHERE EXISTS (
SELECT 1
FROM STRING_SPLIT(t1.column, ',') s
WHERE
s.value IN(...)
);

ANSWER 1: SQL CLR Approach
STEP 1: Test SQL Schema
create table Meeting
(
Id int identity(1,1) primary key,
MeetingName nvarchar(max) null,
DbColumnCommaSeparated nvarchar(max) not null
)
go
truncate table Meeting
insert into Meeting
values('one','1,2,3,4');
insert into Meeting
values('two','5,6,7,8');
insert into Meeting
values('three','1,2,7,8');
insert into Meeting
values('four','11,22,73,84');
insert into Meeting
values('five','14,25,76,87');
STEP 2: Write SQL CLR Function read more
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Data.Linq;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean FilterCSVFunction(string source, string item)
{
return new SqlBoolean(Array.Exists(source.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries), i => i == item));
}
}
STEP 3: Enable SQL CLR for database
EXEC sp_configure 'clr enabled', 1;
RECONFIGURE;
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'clr strict security', 0;
RECONFIGURE;
STEP 4: Actual Query
DECLARE #y nvarchar(max)
SET #y = '7'
SELECT * FROM Meeting
WHERE dbo.FilterCSVFunction(DbColumnCommaSeparated, #y) = 1
STEP 5: You can import the function in ef
ANSWER 2 Updated
DECLARE #ArrayOfInputString TABLE(y INT);
INSERT INTO #ArrayOfInputString(y) VALUES(7);
INSERT INTO #ArrayOfInputString(y) VALUES(8);
SELECT DISTINCT M1.* FROM Meeting M1 JOIN
(
SELECT ID, Split.a.value('.', 'VARCHAR(100)') As Data FROM
(SELECT M2.ID, CAST ('<M>' + REPLACE(DbColumnCommaSeparated, ',', '</M><M>') + '</M>' AS XML) as DbXml FROM Meeting M2) A
CROSS APPLY A.DbXml.nodes ('/M') AS Split(a)
) F ON M1.ID = F.ID
WHERE F.Data IN (SELECT y FROM #ArrayOfInputString)

Related

How do I insert multiple records using Dapper while also including other dynamic parameters?

Here is a truncated example of what I'm trying to do:
var stuffTOSave = new List<SomeObject> {
public int OtherTableId { get; set; }
public List<Guid> ComponentIds { get; set; }
};
var sql = #"CREATE TABLE Components( ComponentId uniqueidentifier PRIMARY KEY )
INSERT INTO Components VALUES (#WhatGoesHere?)
SELECT * FROM OtherTable ot
JOIN Components c on c.ComponentId = ot.ComponentId
WHERE Id = #OtherTableId
DROP TABLE Components"
Connection.Execute(sql, stuffToSave);
I know from other SO questions that you can pass a list into an insert statement with Dapper, but I can't find any examples that pass a list as well as another parameter (in my example, OtherTableId), or that have a non-object list (List<Guid> as opposed to a List<SomeObject> that has properties with names to reference).
For the second issue, I could select the ComponentIds into a list to give them a name like:
stuffToSave.ComponentIds.Select(c => new { ComponentId = c })
but then I'm not sure what to put in my sql query so that dapper understands to get the ComponentId property from my list of ComponentIds (Line 7)
I would still like to know the real way of accomplishing this, but I have this workaround that uses string interpolation:
var sql = $#"CREATE TABLE Components( ComponentId uniqueidentifier PRIMARY KEY )
INSERT INTO Components VALUES ('{string.Join($"'),{Environment.NewLine}('", request.ComponentIds)}')
SELECT * FROM OtherTable ot
JOIN Components c on c.ComponentId = ot.ComponentId
WHERE Id = #OtherTableId
DROP TABLE Components"
I'm not worried about SQL Injection since this is just interpolating a list of Guids, but I'd rather avoid this method if possible.

Entity Framework Core; using ORDER BY in query against a (MS) SQL Server

I'm trying to use the following query in combination with Entity Framework Core against a Microsoft SQL Server 2016:
SELECT [a], [b], [c]
FROM [x]
WHERE [a] = {0}
ORDER BY [b]
I use this query like so:
context.MySet.AsNoTracking()
.FromSql(MyQuery, aValue)
.Skip(pageSize * page)
.Take(pageSize)
.Select(x => x.ToJsonDictionary())
.ToList()
I use this in a .NET Core REST API with pagination and I'd like to have the records sorted (alphabetically) to make the pagination more usable.
I get the following error when executing the above statement:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP, OFFSET
or FOR XML is also specified.Invalid usage of the option NEXT in the
FETCH statement. The ORDER BY clause is invalid in views, inline
functions, derived tables, subqueries, and common table expressions,
unless TOP, OFFSET or FOR XML is also specified.Invalid usage of the
option NEXT in the FETCH statement.
Looking for similar issues I found these some other posts (1, 2, 3) but none of which where used in combination with EF Core and/or they were using it in a different context which does not apply in my case (e.g. subquery).
I tried to use the .OrderBy(..) syntax of EF instead of in the ORDER BY in the query but this doesn't solve the problem. I also tried adding TOP 100 PERCENT after the SELECT in the query in combination with the ORDRE BY; this worked but didn't order the column. It just got ignored. This limitation is described under the EF Limitations. I also found this post that replace the TOP 100 PERCENT... with TOP 99.99 PERCENT... or TOP 9999999... `. This seems like it should work but it doesn't 'feel' right.
This issue in general is further explained here.
Summary: It is not advisable to use ORDER BY in Views. Use ORDER BY
outside the views. In fact, the correct design will imply the same. If
you use TOP along with Views, there is a good chance that View will
not return all the rows of the table or will ignore ORDER BY
completely.
Further I'm confused by the word "view". For me, the term views refers to the usage of the ones created by the CREATE VIEW .. syntax. Is a plain, 'normal' SQL query also considered a view? Or is EF Core wrapping the request in some sort of view and this is the real issue causing this error?
I'm not sure, but so far all the 'solutions' I found seem kind of 'hacky'.
Thoughts?
Let's simplify things a bit. Here's what I came up for testing. I've also added some code for printing the generated sql from EF queries.
class Program
{
static void Main(string[] args)
{
DbClient context = new DbClient();
var rawSql = "select [Id], [Title] from Post order by [Title]";
var query = context.Posts.AsNoTracking()
.FromSql(rawSql)
.Skip(1)
.Take(4)
.OrderBy(x => x.Title);
var generated = query.ToSql();
var results = query.ToList();
}
}
class DbClient : DbContext
{
public DbSet<Post> Posts { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlServer("conn_string");
}
}
class Post
{
public int Id { get; set; }
public string Title { get; set; }
public override string ToString() => $"{Id} | {Title}";
}
When we look at the value of generated we see what the sql of the query is:
SELECT [t].[Id], [t].[Title]
FROM (
SELECT [p].[Id], [p].[Title]
FROM (
select [Id], [Title] from Post order by [Title]
) AS [p]
ORDER BY (SELECT 1)
OFFSET 1 ROWS FETCH NEXT 4 ROWS ONLY
) AS [t]
ORDER BY [t].[Title]
Notice that there three order by clauses, the inner-most one is the one from rawSql.
We can look at the error message to see why it's not legal:
The ORDER BY clause is invalid in [...] subqueries [...] unless OFFSET [...] is also specified.
The middle order by does include offset, so that's valid even though it's inside a subquery.
To fix this, just simply remove the ordering from your rawSql and keep using the OrderBy() linq method.
var rawSql = "select [Id], [Title] from Post";
var query = context.Posts.AsNoTracking()
.FromSql(rawSql)
.Skip(1)
.Take(4)
.OrderBy(x => x.Title);
This generates:
SELECT [t].[Id], [t].[Title]
FROM (
SELECT [p].[Id], [p].[Title]
FROM (
select [Id], [Title] from Post
) AS [p]
ORDER BY (SELECT 1)
OFFSET 1 ROWS FETCH NEXT 4 ROWS ONLY
) AS [t]
ORDER BY [t].[Title]
Now, all order by clauses are either not in subqueries, or have an offset clause.

LINQ to SQL: how to select specific fields conditionally into DTOs (performance)

LINQ to SQL is great however it backfires in performance in some cases. And with my case it backfires bigtime.
I have a StudentServiceDm that maps to a StudentService Table with 40 some fields. I have 10-20 millions of services in the table. The requirement from the client is that they need to export all services into excel from the application, while selecting only the fields they want at the time.
From my understanding, if i do
IQueryable<StudentServiceDm> serviceQuery = GetListQuery();
List<StudentServiceDto> dtos = serviceQuery.Select(m => new StudentServiceDto
{
Id = m.Id,
Name = m.Name,
}).ToList();
Then this translates into the SQL that selects only the fields I need
SELECT Id, Name FROM StudentService
Which saves performance by alot by decreasing the number of fields that need to be selected.
But in my case I need to allow the user to conditionally select fields like this, so I do this:
IQueryable<StudentServiceDm> serviceQuery = GetListQuery();
List<StudentServiceDto> dtos = serviceQuery.Select(m => new StudentServiceDto
{
Id = (options.ShowId) ? m.Id : null,
Name = (options.ShowName) ? m.Name,
StudentFirstName = (options.ShowFirstName) ? m.Student.FirstName : null,
StudentLastName = (options.ShowLastName) ? m.Student.LastName : null,
StudentServiceType = (options.ShowType) ? m.StudentServiceType.Name : null,
StudentServiceSubType = (options.ShowSubType) ? m.StudentServiceSubType.Name : null,
Date = (options.ShowDate) ? m.Date : null,
// alot more assignments .....
}).ToList();
However, this translates into the SQL query to something like this (according to SQL Profiler:
SELECT
CASE WHEN (#p__linq__1 = 1) THEN [Project2].[Date] END AS [C1],
CASE WHEN (#p__linq__2 = 1) THEN [Project2].[Name] END AS [C2],
CASE WHEN (#p__linq__3 = 1) THEN [Project2].[StudentServiceTypeId] END AS [C3],
CASE WHEN (#p__linq__4 = 1) THEN [Project2].[StudentServiceSubTypeId] END AS [C4],
CASE WHEN (#p__linq__5 = 1) THEN [Project2].[Date] END AS [C5],
// ..... more fields
FROM * //
where it puts a condition on every field to check if it should retrieve ... and actually this query's performance is almost the same as if all fields were selected.
What I want to achieve is, if the user only chooses to select Name and Date, then the translated SQL should only be
SELECT Name, Date FROM StudentService
After some performance analysis, the performance difference between these 2 sets of statement is about 3-4 times.
Any expert opinion?
You can create Linq expressions at runtime that exactly correspond to what you'd type in inside a .Select() statement, it's just not very straightforward. I never did it for Linq to SQL specifically but I don't see why it wouldn't work. Something like How do I dynamically create an Expression<Func<MyClass, bool>> predicate? but of course I'd recommend reading the docs first to get familiar with expressions.
The only way to do what you are interested in doing is to create lots of different Select() sections, each corresponding to a different combination of fields. However, you will need to create an awful lot of different combinations (N!, where N is the number of fields), so it is not really practical or maintainable.
There is no other way to do this using L2S.
If performance is important for you in this aspect, I suggest creating the sql directly and using something like Dapper to retrieve the data. Then you can easily control exactly the fields that you want to include.

Join query based on name of the Entity

I have a database that gets loaded based on Template T. However, now I want to join other tables based on strings or passing in a "T2" Template.
How can I create a function like this to generate an IQueryable?
public void createJoinedTable<T, T2>(T2 join_table, string join_on_this_property, string order, string order_by)
where T : class
where T2 : class
{
var table = GetGenericTable<T>(); // I have the IQueryable<T> of the main table.
// now join the joined table.
var id = 1;
table = table // your starting point - table in the "from" statement
.Join(join_table, // the source table of the inner join
firsttable => post.myid, // Select the primary key (the first part of the "on" clause in an sql "join" statement)
secondtable => meta.othertableid, // Select the foreign key (the second part of the "on" clause)
(firsttable, secondtable) => new { Unknown = firsttable , Unknown2 = secondtable}) // selection
.Where(x => x.Unknown.ID == id); // where statement
table = table.CustomOrderByDescending(order_by, direction); // custom ordering by string
m_queryable = table; // record results.
}
The problem is, that I cannot do a .Join() because it is not constrained by the Entity class. It's constrained as a generic "class".
Where T : class instead of where T: MyEntityTable
Well, if I did that in the arguments, then what's the point of having a "generic join table function"?
I want to be able to join whatever I want and based on text-based arguments.
How would I use "join_on_this_property" to help me accomplish this?
BONUS Challenge: Join unlimited amounts of tables based on "List tables, List join_ON_properties"--but that could be very complicated.

Using IQueryable with Linq

What is the use of IQueryable in the context of LINQ?
Is it used for developing extension methods or any other purpose?
Marc Gravell's answer is very complete, but I thought I'd add something about this from the user's point of view, as well...
The main difference, from a user's perspective, is that, when you use IQueryable<T> (with a provider that supports things correctly), you can save a lot of resources.
For example, if you're working against a remote database, with many ORM systems, you have the option of fetching data from a table in two ways, one which returns IEnumerable<T>, and one which returns an IQueryable<T>. Say, for example, you have a Products table, and you want to get all of the products whose cost is >$25.
If you do:
IEnumerable<Product> products = myORM.GetProducts();
var productsOver25 = products.Where(p => p.Cost >= 25.00);
What happens here, is the database loads all of the products, and passes them across the wire to your program. Your program then filters the data. In essence, the database does a SELECT * FROM Products, and returns EVERY product to you.
With the right IQueryable<T> provider, on the other hand, you can do:
IQueryable<Product> products = myORM.GetQueryableProducts();
var productsOver25 = products.Where(p => p.Cost >= 25.00);
The code looks the same, but the difference here is that the SQL executed will be SELECT * FROM Products WHERE Cost >= 25.
From your POV as a developer, this looks the same. However, from a performance standpoint, you may only return 2 records across the network instead of 20,000....
In essence its job is very similar to IEnumerable<T> - to represent a queryable data source - the difference being that the various LINQ methods (on Queryable) can be more specific, to build the query using Expression trees rather than delegates (which is what Enumerable uses).
The expression trees can be inspected by your chosen LINQ provider and turned into an actual query - although that is a black art in itself.
This is really down to the ElementType, Expression and Provider - but in reality you rarely need to care about this as a user. Only a LINQ implementer needs to know the gory details.
Re comments; I'm not quite sure what you want by way of example, but consider LINQ-to-SQL; the central object here is a DataContext, which represents our database-wrapper. This typically has a property per table (for example, Customers), and a table implements IQueryable<Customer>. But we don't use that much directly; consider:
using(var ctx = new MyDataContext()) {
var qry = from cust in ctx.Customers
where cust.Region == "North"
select new { cust.Id, cust.Name };
foreach(var row in qry) {
Console.WriteLine("{0}: {1}", row.Id, row.Name);
}
}
this becomes (by the C# compiler):
var qry = ctx.Customers.Where(cust => cust.Region == "North")
.Select(cust => new { cust.Id, cust.Name });
which is again interpreted (by the C# compiler) as:
var qry = Queryable.Select(
Queryable.Where(
ctx.Customers,
cust => cust.Region == "North"),
cust => new { cust.Id, cust.Name });
Importantly, the static methods on Queryable take expression trees, which - rather than regular IL, get compiled to an object model. For example - just looking at the "Where", this gives us something comparable to:
var cust = Expression.Parameter(typeof(Customer), "cust");
var lambda = Expression.Lambda<Func<Customer,bool>>(
Expression.Equal(
Expression.Property(cust, "Region"),
Expression.Constant("North")
), cust);
... Queryable.Where(ctx.Customers, lambda) ...
Didn't the compiler do a lot for us? This object model can be torn apart, inspected for what it means, and put back together again by the TSQL generator - giving something like:
SELECT c.Id, c.Name
FROM [dbo].[Customer] c
WHERE c.Region = 'North'
(the string might end up as a parameter; I can't remember)
None of this would be possible if we had just used a delegate. And this is the point of Queryable / IQueryable<T>: it provides the entry-point for using expression trees.
All this is very complex, so it is a good job that the compiler makes it nice and easy for us.
For more information, look at "C# in Depth" or "LINQ in Action", both of which provide coverage of these topics.
Although Reed Copsey and Marc Gravell already described about IQueryable (and also IEnumerable) enough,mI want to add little more here by providing a small example on IQueryable and IEnumerable as many users asked for it
Example: I have created two table in database
CREATE TABLE [dbo].[Employee]([PersonId] [int] NOT NULL PRIMARY KEY,[Gender] [nchar](1) NOT NULL)
CREATE TABLE [dbo].[Person]([PersonId] [int] NOT NULL PRIMARY KEY,[FirstName] [nvarchar](50) NOT NULL,[LastName] [nvarchar](50) NOT NULL)
The Primary key(PersonId) of table Employee is also a forgein key(personid) of table Person
Next i added ado.net entity model in my application and create below service class on that
public class SomeServiceClass
{
public IQueryable<Employee> GetEmployeeAndPersonDetailIQueryable(IEnumerable<int> employeesToCollect)
{
DemoIQueryableEntities db = new DemoIQueryableEntities();
var allDetails = from Employee e in db.Employees
join Person p in db.People on e.PersonId equals p.PersonId
where employeesToCollect.Contains(e.PersonId)
select e;
return allDetails;
}
public IEnumerable<Employee> GetEmployeeAndPersonDetailIEnumerable(IEnumerable<int> employeesToCollect)
{
DemoIQueryableEntities db = new DemoIQueryableEntities();
var allDetails = from Employee e in db.Employees
join Person p in db.People on e.PersonId equals p.PersonId
where employeesToCollect.Contains(e.PersonId)
select e;
return allDetails;
}
}
they contains same linq. It called in program.cs as defined below
class Program
{
static void Main(string[] args)
{
SomeServiceClass s= new SomeServiceClass();
var employeesToCollect= new []{0,1,2,3};
//IQueryable execution part
var IQueryableList = s.GetEmployeeAndPersonDetailIQueryable(employeesToCollect).Where(i => i.Gender=="M");
foreach (var emp in IQueryableList)
{
System.Console.WriteLine("ID:{0}, EName:{1},Gender:{2}", emp.PersonId, emp.Person.FirstName, emp.Gender);
}
System.Console.WriteLine("IQueryable contain {0} row in result set", IQueryableList.Count());
//IEnumerable execution part
var IEnumerableList = s.GetEmployeeAndPersonDetailIEnumerable(employeesToCollect).Where(i => i.Gender == "M");
foreach (var emp in IEnumerableList)
{
System.Console.WriteLine("ID:{0}, EName:{1},Gender:{2}", emp.PersonId, emp.Person.FirstName, emp.Gender);
}
System.Console.WriteLine("IEnumerable contain {0} row in result set", IEnumerableList.Count());
Console.ReadKey();
}
}
The output is same for both obviously
ID:1, EName:Ken,Gender:M
ID:3, EName:Roberto,Gender:M
IQueryable contain 2 row in result set
ID:1, EName:Ken,Gender:M
ID:3, EName:Roberto,Gender:M
IEnumerable contain 2 row in result set
So the question is what/where is the difference? It does not seem to
have any difference right? Really!!
Let's have a look on sql queries generated and executed by entity
framwork 5 during these period
IQueryable execution part
--IQueryableQuery1
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE ([Extent1].[PersonId] IN (0,1,2,3)) AND (N'M' = [Extent1].[Gender])
--IQueryableQuery2
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Employee] AS [Extent1]
WHERE ([Extent1].[PersonId] IN (0,1,2,3)) AND (N'M' = [Extent1].[Gender])
) AS [GroupBy1]
IEnumerable execution part
--IEnumerableQuery1
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE [Extent1].[PersonId] IN (0,1,2,3)
--IEnumerableQuery2
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE [Extent1].[PersonId] IN (0,1,2,3)
Common script for both execution part
/* these two query will execute for both IQueryable or IEnumerable to get details from Person table
Ignore these two queries here because it has nothing to do with IQueryable vs IEnumerable
--ICommonQuery1
exec sp_executesql N'SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM [dbo].[Person] AS [Extent1]
WHERE [Extent1].[PersonId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=1
--ICommonQuery2
exec sp_executesql N'SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM [dbo].[Person] AS [Extent1]
WHERE [Extent1].[PersonId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=3
*/
So you have few questions now, let me guess those and try to answer them
Why are different scripts generated for same result?
Lets find out some points here,
all queries has one common part
WHERE [Extent1].[PersonId] IN (0,1,2,3)
why? Because both function IQueryable<Employee> GetEmployeeAndPersonDetailIQueryable and
IEnumerable<Employee> GetEmployeeAndPersonDetailIEnumerable of SomeServiceClass contains one common line in linq queries
where employeesToCollect.Contains(e.PersonId)
Than why is the
AND (N'M' = [Extent1].[Gender]) part is missing in IEnumerable execution part, while in both function calling we used Where(i => i.Gender == "M") inprogram.cs`
Now we are in the point where difference came between IQueryable and
IEnumerable
What entity framwork does when an IQueryable method called, it tooks linq statement written inside the method and try to find out if more linq expressions are defined on the resultset, it then gathers all linq queries defined until the result need to fetch and constructs more appropriate sql query to execute.
It provide a lots of benefits like,
only those rows populated by sql server which could be valid by the
whole linq query execution
helps sql server performance by not selecting unnecessary rows
network cost get reduce
like here in example sql server returned to application only two rows after IQueryable execution` but returned THREE rows for IEnumerable query why?
In case of IEnumerable method, entity framework took linq statement written inside the method and constructs sql query when result need to fetch. it does not include rest linq part to constructs the sql query. Like here no filtering is done in sql server on column gender.
But the outputs are same? Because 'IEnumerable filters the result further in application level after retrieving result from sql server
SO, what should someone choose?
I personally prefer to define function result as IQueryable<T> because there are lots of benefit it has over IEnumerable like, you could join two or more IQueryable functions, which generate more specific script to sql server.
Here in example you can see an IQueryable Query(IQueryableQuery2) generates a more specific script than IEnumerable query(IEnumerableQuery2) which is much more acceptable in my point of view.
It allows for further querying further down the line. If this was beyond a service boundary say, then the user of this IQueryable object would be allowed to do more with it.
For instance if you were using lazy loading with nhibernate this might result in graph being loaded when/if needed.

Categories

Resources