Linq Query for a particular scenario

Linq Query for a particular scenario - c#

Class Demo
{
public int Id{get;set;};
public string Name{get;set;}
public int Parent{get;set;};
public IList<Demo> children{get;set;}
}
Now my code returns List demos. The data is three level nested. I mean List can
contain again Children1 based on each upper level Id matching nested level Parent. It goes another level down where the upper Children1 each ID matches nested Parent and another List of Demos. How can I write an optimized query from it. *
List Demos will have huge data then each Demo Id matches with Parent if matches List of Children (suppose children1) is fetched and then based on the each child in Children 1 which matches Id with Parent another List of Children is filled.

Since you are saying Linq To SQL, I assume this has a backing table with a self join which is already an optimized way of defining such structures. If you generate your Linq To SQL model from such a table then your model would already have navigational properties for 'parent' and 'children'. SQL Server sample database's Employees table is a good example for this. Based on its model you would have something like:
var e = Employees.Select(em => new {
em.EmployeeID,
em.FirstName,
em.LastName,
em.ReportsToChildren
});
and that would generate this SQL:
SELECT [t0].[EmployeeID], [t0].[FirstName], [t0].[LastName], [t1].[EmployeeID] AS [EmployeeID2], [t1].[LastName] AS [LastName2], [t1].[FirstName] AS [FirstName2], [t1].[Title], [t1].[TitleOfCourtesy], [t1].[BirthDate], [t1].[HireDate], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[HomePhone], [t1].[Extension], [t1].[Photo], [t1].[Notes], [t1].[ReportsTo], [t1].[PhotoPath], (
SELECT COUNT(*)
FROM [Employees] AS [t2]
WHERE [t2].[ReportsTo] = [t0].[EmployeeID]
) AS [value]
FROM [Employees] AS [t0]
LEFT OUTER JOIN [Employees] AS [t1] ON [t1].[ReportsTo] = [t0].[EmployeeID]
ORDER BY [t0].[EmployeeID], [t1].[EmployeeID]
Using ToList() on this is a slight detail for enumeration.
Note: Using a utility like LinqPad, you can test this quickly and easily.

Related

Linq query without using Joins for two tables distinct value

Table 1
Contract ID , Vendor Name, Description , user
Table 2
Contract ID , product , department
Match condition : for all the Contract ID matching with table 1 , get their Vendor Name and Contract ID
Query result output :
Contract ID(Distinct),Vendor Name
Below code using inner join , need same output without using join as linq query
\\
select table1.Contract ID,table1.Vendor Name ,table2.Contract ID
from table1 as s1
inner join table2 as s2
on s1.Contract ID=s2.Contract ID
\\\
Thanks in Advance

Considering you need only Join alternative to select distinct ,you can use inner query logic like below to write LINQ
SELECT contractorid ,vendor name
Where
Contracterid in
(Select distinct contractor id from table2)
Here assumption is contractorId is primary key in table 1

If I understand correctly, you want to retrieve a collection of objects containing Contract Id and Vendor Names, without duplicates, whose Contract Id is found in Table 2.
It is unclear if you are using Linq to objects, Linq to Entities, or any other Linq flavor, which would have a meaningful impact on how to best construct a query for a specific purpose.
But as a first hint, here is a way to perform this without join with Linq:
// Get a list of all distinct Contract Ids in Table 2
var AllTable2ContractIds = Table2
.Select(e => e.ContractId)
.Distinct()
.ToList();
// With Table 1
// Keep only entries whose contract Id is found in the list just contructed above.
// transform it to keep Contract Id and Vendor Name.
// The use of ToList() at the end is not mandatory.
// It depends if you want to materialize the result or not.
var Result = Table1
.Where(e => AllTable2ContractIds.Contains(e.ContractId))
.Select(e => new
{
e.ContractId,
e.VendorName
})
.ToList();

Linq Multiple Joins

I have some sql tables that I need to query information from my current query that returns a single column list is:
from f in FactSales
where f.DateKey == 20130921
where f.CompanyID <= 1
join item in DimMenuItems
on f.MenuItemKey equals item.MenuItemKey
join dmi in DimMenuItemDepts
on item.MenuItemDeptKey equals dmi.MenuItemDeptKey
group f by dmi.MenuItemDeptKey into c
select new {
Amount = c.Sum(l=>l.Amount)
}
This returns the data I want and it groups correctly by the third table I join but I cannot get the Description column from the dmi table. I have tried to add the field
Description = dmi.Description
but it doesnt work. How can I get data from the third table into the new select that I am creating with this statement? Many thanks for any help.

Firstly you are using Entity Framework COMPLETELY WRONG. Linq is NOT SQL.
You shouldn't be using join. Instead you should be using Associations.
So instead, your query should look like...
from sale in FactSales
where sale.DateKey == 20130921
where sale.CompanyID <= 1
group sale by sale.Item.Department into c
select new
{
Amount = c.Sum(l => l.Amount)
Department = c.Key
}
By following Associations, you will automatically be implicitly joining.
You should not be grouping by the id of the "table" but by the actual "row", or in Object parlance (which is what you should be using in EF, since the raison d'etre of an ORM is to convert DB to Object), is that you should be grouping by the "entity" rather than they the "entity's key".
EF already knows that the key is unique to the entity.
The grouping key word only allows you to access sale and sale.Item.Department after it. It is a transform, rather than an operator like in SQL.

Linq Query to Get Distinct Records from Two Tables

I have two Tables - tblExpenses and tblCategories as follows
tblExpenses
ID (PK),
Place,
DateSpent,
CategoryID (FK)
tblCategory
ID (PK),
Name
I tried various LINQ approaches to get all distinct records from the above two tables but not with much success. I tried using UNION and DISTINCT but it didnt work.
The above two tables are defined in my Model section of my project which in turn will create tables in SQLite. I need to retrieve all the distinct records from both the tables to display values in gridview.
Kindly provide me some inputs to accomplish this task. I did some research to find answer to this question but nothing seemed close to what I wanted. Excuse me if I duplicated this question.
Here is the UNION, DISTINCT approaches I tried:
DISTINCT # ==> Gives me Repetitive values
(from exp in db.Table<tblExpenses >()
from cat in db.Table<tblCategory>()
select new { exp.Id, exp.CategoryId, exp.DateSpent, exp.Expense, exp.Place, cat.Name }).Distinct();
UNION # ==> Got an error while using UNION

I think union already does the distict when you join the two tables you can try somethin like
var query=(from c in db.tblExpenses select c).Concat(from c in
db.tblCategory select c).Distinct().ToList();

You will always get DISTINCT records, since you are selecting the tblExpenses.ID too. (Unless there are multiple categories with the same ID. But that of course would be really, really bad design.)
Remember, when making a JOIN in LINQ, both field names and data types should be the same. Is the field tblExpenses.CategoryID a nullable field?
If so, try this JOIN:
db.Table<tblExpenses>()
.Join(db.Table<tblCategory>(),
exp => new { exp.CategoryId },
cat => new { CategoryId = (int?)cat.ID },
(exp, cat) => new {
exp.Id,
exp.CategoryId,
exp.DateSpent,
exp.Expense,
exp.Place,
cat.Name
})
.Select(j => new {
j.Id,
j.CategoryId,
j.DateSpent,
j.Expense,
j.Place,
j.Name
});

You can try this queries:
A SELECT DISTINCT query like this:
SELECT DISTINCT Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;
limits the results to unique values in the output field. The query results are not updateable.
or
A SELECT DISTINCTROW query like this:
SELECT DISTINCTROW Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;<br/><br/>
looks at the entire underlying tables, not just the output fields, to find unique rows.
reference:http://www.fmsinc.com/microsoftaccess/query/distinct_vs_distinctrow/unique_values_records.asp

Structuring large SQL rowset(s) and consuming in .NET

Take a look at this psuedo schema (please note this is a simplification so please try not to comment too heavily on the "advisability" of the schema itself). Assume Indexes are inplace on the FKs.
TABLE Lookup (
Lookup_ID int not null PK
Name nvarchar(255) not null
)
TABLE Document (
Document_ID int not null PK
Previous_ID null FK REFERENCES Document(Document_ID)
)
TABLE Document_Lookup (
Document_ID int not null FK REFERENCES Document(Document_ID)
Lookup_ID int not null FK REFERENCES Lookup(Lookup_ID)
)
Volumes: Document, 4 Million rows of which 90% have a null Previous_ID field value; Lookup, 6000 rows, Average lookups attached to each document 20 giving Document_Lookup 80 Millions rows.
Now in a .NET Service have structure to represent a Lookup row like this:-
struct Lookup
{
public int ID;
public string Name;
public List<int> DocumentIDs;
}
and that lookup rows are stored in a Dictionary<int, Lookup> where the key is the lookup ID. An important point here is that this dictionary should contain entries where the Lookup is referenced by at least one document, i.e., the list DocumentIDs should have Count > 0.
My task is populate this dictionary efficiently. So the simple approach would be:-
SELECT dl.Lookup_ID, l.Name, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
This could then be used to populate a the dictionary fairly efficiently.
The Question: Does the underlying rowset delivery (TDS?) perform some optimization? It seems to me that queries that de-normalise data are very common hence the possiblity that field values don't change from one row to the next is high, hence it would make sense to optomise the stream by not sending field values that haven't changed. Does anyone know whether such an optomisation is in place? (Optomisation does not appear to exist).
What more sophisticated query could I use to eliminate the duplication (I'm think specifically of repeating the name value)? I've heard of such a thing a "nested rowset", can that sort of thing be generated? Would it be more performant? How would I access it in .NET?
I would perform two queries; one to populate the Lookup dictionary then a second to populate the ditionary lists. I would then add code to knock out the unused Lookup entires. However imagine I got my predictions wrong and Lookup ended up being 1 Million rows with only a quarter actually referenced by any document?

As long as the names are relatively short in practice, the optimisation may not be necessary.
The easiest optimisation is to split it into two queries, one to get the names, the other to get the Document_ID list. (can be in the other order if it makes it easier to populate your data structures).
Example:
/*First get the name of the Lookup*/
select distinct dl.Lookup_ID, l.Name
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
/*Now get the list of Document_IDs for each*/
SELECT dl.Lookup_ID, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
There are also various tricks you could use to massage these into a single table but I suggest these are not worthwile.
The heirarchical rowsets you are thinking of are the MSDASHAPE OLEDB provider. They can do what you are suggesting but would restrict you to using the OLEDB provider for SQL which may not be what you want.
Finally consider careful XML
For example:
select
l.lookup_ID as "#l",
l.name as "#n",
(
select dl.Document_ID as "node()", ' ' as "node()"
from Document_Lookup dl where dl.lookup_ID = l.lookup_ID for xml path(''), type
) as "*"
from Lookup l
where l.lookup_ID in (select dl.lookup_ID from Document_Lookup dl)
for xml path('dl')
returns:
<dl l="1" n="One">1 2 </dl>
<dl l="2" n="Two">2 </dl>

When you're asking about "nested rowsets" are you referring to using the DbDataReader.NextResult() method?
if your query has two "outputs" (two select statements which return a separate resultsets), you can loop through the first using DbDataReader.Next() and when that returns "false" then you can call DbDataReader.NextResult() and then use DbDataReader.Next() again to continue.
var reader = cmd.ExecuteReader();
while(reader.Read()){
// load data
}
if(reader.NextResult()){
while(reader.Read()){
// lookup record from first result
// load data from second result
}
}
I've done this frequently to reduce duplicate data in a similar situation and it works really well:
SELECT * FROM tableA WHERE [condition]
SELECT * FROM tableB WHERE EXISTS (SELECT * FROM tableA WHERE [condition] AND tableB.FK = tableA.PK)
Disclaimer: I haven't tried this with a resultset as large as you're describing.
The downside of this is you'll need a way to map the second resultset to the first, using a hashtable or order list.

Using IQueryable with Linq

What is the use of IQueryable in the context of LINQ?
Is it used for developing extension methods or any other purpose?

Marc Gravell's answer is very complete, but I thought I'd add something about this from the user's point of view, as well...
The main difference, from a user's perspective, is that, when you use IQueryable<T> (with a provider that supports things correctly), you can save a lot of resources.
For example, if you're working against a remote database, with many ORM systems, you have the option of fetching data from a table in two ways, one which returns IEnumerable<T>, and one which returns an IQueryable<T>. Say, for example, you have a Products table, and you want to get all of the products whose cost is >$25.
If you do:
IEnumerable<Product> products = myORM.GetProducts();
var productsOver25 = products.Where(p => p.Cost >= 25.00);
What happens here, is the database loads all of the products, and passes them across the wire to your program. Your program then filters the data. In essence, the database does a SELECT * FROM Products, and returns EVERY product to you.
With the right IQueryable<T> provider, on the other hand, you can do:
IQueryable<Product> products = myORM.GetQueryableProducts();
var productsOver25 = products.Where(p => p.Cost >= 25.00);
The code looks the same, but the difference here is that the SQL executed will be SELECT * FROM Products WHERE Cost >= 25.
From your POV as a developer, this looks the same. However, from a performance standpoint, you may only return 2 records across the network instead of 20,000....

In essence its job is very similar to IEnumerable<T> - to represent a queryable data source - the difference being that the various LINQ methods (on Queryable) can be more specific, to build the query using Expression trees rather than delegates (which is what Enumerable uses).
The expression trees can be inspected by your chosen LINQ provider and turned into an actual query - although that is a black art in itself.
This is really down to the ElementType, Expression and Provider - but in reality you rarely need to care about this as a user. Only a LINQ implementer needs to know the gory details.
Re comments; I'm not quite sure what you want by way of example, but consider LINQ-to-SQL; the central object here is a DataContext, which represents our database-wrapper. This typically has a property per table (for example, Customers), and a table implements IQueryable<Customer>. But we don't use that much directly; consider:
using(var ctx = new MyDataContext()) {
var qry = from cust in ctx.Customers
where cust.Region == "North"
select new { cust.Id, cust.Name };
foreach(var row in qry) {
Console.WriteLine("{0}: {1}", row.Id, row.Name);
}
}
this becomes (by the C# compiler):
var qry = ctx.Customers.Where(cust => cust.Region == "North")
.Select(cust => new { cust.Id, cust.Name });
which is again interpreted (by the C# compiler) as:
var qry = Queryable.Select(
Queryable.Where(
ctx.Customers,
cust => cust.Region == "North"),
cust => new { cust.Id, cust.Name });
Importantly, the static methods on Queryable take expression trees, which - rather than regular IL, get compiled to an object model. For example - just looking at the "Where", this gives us something comparable to:
var cust = Expression.Parameter(typeof(Customer), "cust");
var lambda = Expression.Lambda<Func<Customer,bool>>(
Expression.Equal(
Expression.Property(cust, "Region"),
Expression.Constant("North")
), cust);
... Queryable.Where(ctx.Customers, lambda) ...
Didn't the compiler do a lot for us? This object model can be torn apart, inspected for what it means, and put back together again by the TSQL generator - giving something like:
SELECT c.Id, c.Name
FROM [dbo].[Customer] c
WHERE c.Region = 'North'
(the string might end up as a parameter; I can't remember)
None of this would be possible if we had just used a delegate. And this is the point of Queryable / IQueryable<T>: it provides the entry-point for using expression trees.
All this is very complex, so it is a good job that the compiler makes it nice and easy for us.
For more information, look at "C# in Depth" or "LINQ in Action", both of which provide coverage of these topics.

Although Reed Copsey and Marc Gravell already described about IQueryable (and also IEnumerable) enough,mI want to add little more here by providing a small example on IQueryable and IEnumerable as many users asked for it
Example: I have created two table in database
CREATE TABLE [dbo].[Employee]([PersonId] [int] NOT NULL PRIMARY KEY,[Gender] [nchar](1) NOT NULL)
CREATE TABLE [dbo].[Person]([PersonId] [int] NOT NULL PRIMARY KEY,[FirstName] [nvarchar](50) NOT NULL,[LastName] [nvarchar](50) NOT NULL)
The Primary key(PersonId) of table Employee is also a forgein key(personid) of table Person
Next i added ado.net entity model in my application and create below service class on that
public class SomeServiceClass
{
public IQueryable<Employee> GetEmployeeAndPersonDetailIQueryable(IEnumerable<int> employeesToCollect)
{
DemoIQueryableEntities db = new DemoIQueryableEntities();
var allDetails = from Employee e in db.Employees
join Person p in db.People on e.PersonId equals p.PersonId
where employeesToCollect.Contains(e.PersonId)
select e;
return allDetails;
}
public IEnumerable<Employee> GetEmployeeAndPersonDetailIEnumerable(IEnumerable<int> employeesToCollect)
{
DemoIQueryableEntities db = new DemoIQueryableEntities();
var allDetails = from Employee e in db.Employees
join Person p in db.People on e.PersonId equals p.PersonId
where employeesToCollect.Contains(e.PersonId)
select e;
return allDetails;
}
}
they contains same linq. It called in program.cs as defined below
class Program
{
static void Main(string[] args)
{
SomeServiceClass s= new SomeServiceClass();
var employeesToCollect= new []{0,1,2,3};
//IQueryable execution part
var IQueryableList = s.GetEmployeeAndPersonDetailIQueryable(employeesToCollect).Where(i => i.Gender=="M");
foreach (var emp in IQueryableList)
{
System.Console.WriteLine("ID:{0}, EName:{1},Gender:{2}", emp.PersonId, emp.Person.FirstName, emp.Gender);
}
System.Console.WriteLine("IQueryable contain {0} row in result set", IQueryableList.Count());
//IEnumerable execution part
var IEnumerableList = s.GetEmployeeAndPersonDetailIEnumerable(employeesToCollect).Where(i => i.Gender == "M");
foreach (var emp in IEnumerableList)
{
System.Console.WriteLine("ID:{0}, EName:{1},Gender:{2}", emp.PersonId, emp.Person.FirstName, emp.Gender);
}
System.Console.WriteLine("IEnumerable contain {0} row in result set", IEnumerableList.Count());
Console.ReadKey();
}
}
The output is same for both obviously
ID:1, EName:Ken,Gender:M
ID:3, EName:Roberto,Gender:M
IQueryable contain 2 row in result set
ID:1, EName:Ken,Gender:M
ID:3, EName:Roberto,Gender:M
IEnumerable contain 2 row in result set
So the question is what/where is the difference? It does not seem to
have any difference right? Really!!
Let's have a look on sql queries generated and executed by entity
framwork 5 during these period
IQueryable execution part
--IQueryableQuery1
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE ([Extent1].[PersonId] IN (0,1,2,3)) AND (N'M' = [Extent1].[Gender])
--IQueryableQuery2
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Employee] AS [Extent1]
WHERE ([Extent1].[PersonId] IN (0,1,2,3)) AND (N'M' = [Extent1].[Gender])
) AS [GroupBy1]
IEnumerable execution part
--IEnumerableQuery1
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE [Extent1].[PersonId] IN (0,1,2,3)
--IEnumerableQuery2
SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[Gender] AS [Gender]
FROM [dbo].[Employee] AS [Extent1]
WHERE [Extent1].[PersonId] IN (0,1,2,3)
Common script for both execution part
/* these two query will execute for both IQueryable or IEnumerable to get details from Person table
Ignore these two queries here because it has nothing to do with IQueryable vs IEnumerable
--ICommonQuery1
exec sp_executesql N'SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM [dbo].[Person] AS [Extent1]
WHERE [Extent1].[PersonId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=1
--ICommonQuery2
exec sp_executesql N'SELECT
[Extent1].[PersonId] AS [PersonId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName]
FROM [dbo].[Person] AS [Extent1]
WHERE [Extent1].[PersonId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=3
*/
So you have few questions now, let me guess those and try to answer them
Why are different scripts generated for same result?
Lets find out some points here,
all queries has one common part
WHERE [Extent1].[PersonId] IN (0,1,2,3)
why? Because both function IQueryable<Employee> GetEmployeeAndPersonDetailIQueryable and
IEnumerable<Employee> GetEmployeeAndPersonDetailIEnumerable of SomeServiceClass contains one common line in linq queries
where employeesToCollect.Contains(e.PersonId)
Than why is the
AND (N'M' = [Extent1].[Gender]) part is missing in IEnumerable execution part, while in both function calling we used Where(i => i.Gender == "M") inprogram.cs`
Now we are in the point where difference came between IQueryable and
IEnumerable
What entity framwork does when an IQueryable method called, it tooks linq statement written inside the method and try to find out if more linq expressions are defined on the resultset, it then gathers all linq queries defined until the result need to fetch and constructs more appropriate sql query to execute.
It provide a lots of benefits like,
only those rows populated by sql server which could be valid by the
whole linq query execution
helps sql server performance by not selecting unnecessary rows
network cost get reduce
like here in example sql server returned to application only two rows after IQueryable execution` but returned THREE rows for IEnumerable query why?
In case of IEnumerable method, entity framework took linq statement written inside the method and constructs sql query when result need to fetch. it does not include rest linq part to constructs the sql query. Like here no filtering is done in sql server on column gender.
But the outputs are same? Because 'IEnumerable filters the result further in application level after retrieving result from sql server
SO, what should someone choose?
I personally prefer to define function result as IQueryable<T> because there are lots of benefit it has over IEnumerable like, you could join two or more IQueryable functions, which generate more specific script to sql server.
Here in example you can see an IQueryable Query(IQueryableQuery2) generates a more specific script than IEnumerable query(IEnumerableQuery2) which is much more acceptable in my point of view.

It allows for further querying further down the line. If this was beyond a service boundary say, then the user of this IQueryable object would be allowed to do more with it.
For instance if you were using lazy loading with nhibernate this might result in graph being loaded when/if needed.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq Query for a particular scenario - c#

Related

Linq query without using Joins for two tables distinct value

Linq Multiple Joins

Linq Query to Get Distinct Records from Two Tables

Structuring large SQL rowset(s) and consuming in .NET

Using IQueryable with Linq

Categories

Resources