Linq query without using Joins for two tables distinct value

Linq query without using Joins for two tables distinct value - c#

Table 1
Contract ID , Vendor Name, Description , user
Table 2
Contract ID , product , department
Match condition : for all the Contract ID matching with table 1 , get their Vendor Name and Contract ID
Query result output :
Contract ID(Distinct),Vendor Name
Below code using inner join , need same output without using join as linq query
\\
select table1.Contract ID,table1.Vendor Name ,table2.Contract ID
from table1 as s1
inner join table2 as s2
on s1.Contract ID=s2.Contract ID
\\\
Thanks in Advance

Considering you need only Join alternative to select distinct ,you can use inner query logic like below to write LINQ
SELECT contractorid ,vendor name
Where
Contracterid in
(Select distinct contractor id from table2)
Here assumption is contractorId is primary key in table 1

If I understand correctly, you want to retrieve a collection of objects containing Contract Id and Vendor Names, without duplicates, whose Contract Id is found in Table 2.
It is unclear if you are using Linq to objects, Linq to Entities, or any other Linq flavor, which would have a meaningful impact on how to best construct a query for a specific purpose.
But as a first hint, here is a way to perform this without join with Linq:
// Get a list of all distinct Contract Ids in Table 2
var AllTable2ContractIds = Table2
.Select(e => e.ContractId)
.Distinct()
.ToList();
// With Table 1
// Keep only entries whose contract Id is found in the list just contructed above.
// transform it to keep Contract Id and Vendor Name.
// The use of ToList() at the end is not mandatory.
// It depends if you want to materialize the result or not.
var Result = Table1
.Where(e => AllTable2ContractIds.Contains(e.ContractId))
.Select(e => new
{
e.ContractId,
e.VendorName
})
.ToList();

Related

Linq Query for a particular scenario

Class Demo
{
public int Id{get;set;};
public string Name{get;set;}
public int Parent{get;set;};
public IList<Demo> children{get;set;}
}
Now my code returns List demos. The data is three level nested. I mean List can
contain again Children1 based on each upper level Id matching nested level Parent. It goes another level down where the upper Children1 each ID matches nested Parent and another List of Demos. How can I write an optimized query from it. *
List Demos will have huge data then each Demo Id matches with Parent if matches List of Children (suppose children1) is fetched and then based on the each child in Children 1 which matches Id with Parent another List of Children is filled.

Since you are saying Linq To SQL, I assume this has a backing table with a self join which is already an optimized way of defining such structures. If you generate your Linq To SQL model from such a table then your model would already have navigational properties for 'parent' and 'children'. SQL Server sample database's Employees table is a good example for this. Based on its model you would have something like:
var e = Employees.Select(em => new {
em.EmployeeID,
em.FirstName,
em.LastName,
em.ReportsToChildren
});
and that would generate this SQL:
SELECT [t0].[EmployeeID], [t0].[FirstName], [t0].[LastName], [t1].[EmployeeID] AS [EmployeeID2], [t1].[LastName] AS [LastName2], [t1].[FirstName] AS [FirstName2], [t1].[Title], [t1].[TitleOfCourtesy], [t1].[BirthDate], [t1].[HireDate], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[HomePhone], [t1].[Extension], [t1].[Photo], [t1].[Notes], [t1].[ReportsTo], [t1].[PhotoPath], (
SELECT COUNT(*)
FROM [Employees] AS [t2]
WHERE [t2].[ReportsTo] = [t0].[EmployeeID]
) AS [value]
FROM [Employees] AS [t0]
LEFT OUTER JOIN [Employees] AS [t1] ON [t1].[ReportsTo] = [t0].[EmployeeID]
ORDER BY [t0].[EmployeeID], [t1].[EmployeeID]
Using ToList() on this is a slight detail for enumeration.
Note: Using a utility like LinqPad, you can test this quickly and easily.

Linq Multiple Joins

I have some sql tables that I need to query information from my current query that returns a single column list is:
from f in FactSales
where f.DateKey == 20130921
where f.CompanyID <= 1
join item in DimMenuItems
on f.MenuItemKey equals item.MenuItemKey
join dmi in DimMenuItemDepts
on item.MenuItemDeptKey equals dmi.MenuItemDeptKey
group f by dmi.MenuItemDeptKey into c
select new {
Amount = c.Sum(l=>l.Amount)
}
This returns the data I want and it groups correctly by the third table I join but I cannot get the Description column from the dmi table. I have tried to add the field
Description = dmi.Description
but it doesnt work. How can I get data from the third table into the new select that I am creating with this statement? Many thanks for any help.

Firstly you are using Entity Framework COMPLETELY WRONG. Linq is NOT SQL.
You shouldn't be using join. Instead you should be using Associations.
So instead, your query should look like...
from sale in FactSales
where sale.DateKey == 20130921
where sale.CompanyID <= 1
group sale by sale.Item.Department into c
select new
{
Amount = c.Sum(l => l.Amount)
Department = c.Key
}
By following Associations, you will automatically be implicitly joining.
You should not be grouping by the id of the "table" but by the actual "row", or in Object parlance (which is what you should be using in EF, since the raison d'etre of an ORM is to convert DB to Object), is that you should be grouping by the "entity" rather than they the "entity's key".
EF already knows that the key is unique to the entity.
The grouping key word only allows you to access sale and sale.Item.Department after it. It is a transform, rather than an operator like in SQL.

Get some value based on the max date (query between two tables)

I have two tables, I would like to get one value based on some max date.
Here are the tables structures:
Items (ItemId, Name)
ItemData(ItemDataId, ItemFK, Invoice, EntryDate) - ItemFK is the foreign key of ItemId in Items table
What I know is the Name of the item only. I would like to get the latest Invoice based on the EntryDate (and the name).
I 1st need to get the itemid based on the name, then get the invoice based on the itemid but only the last one (so using max(enteydate).
How to do so with using innerjoin (or some other join sql query)?

You join to a derived table, which is a subquery with an alias.
select yourfields
from someTable join otherTablesMaybe on something
join (
select id, max(datefield) maxDate
from someTable
where whatever
group by id ) derivedTable on someTable.id = derivedTable.id
and someTable.datefield = maxDate
where whatever
The two where whatevers should be the same.

Linq Query to Get Distinct Records from Two Tables

I have two Tables - tblExpenses and tblCategories as follows
tblExpenses
ID (PK),
Place,
DateSpent,
CategoryID (FK)
tblCategory
ID (PK),
Name
I tried various LINQ approaches to get all distinct records from the above two tables but not with much success. I tried using UNION and DISTINCT but it didnt work.
The above two tables are defined in my Model section of my project which in turn will create tables in SQLite. I need to retrieve all the distinct records from both the tables to display values in gridview.
Kindly provide me some inputs to accomplish this task. I did some research to find answer to this question but nothing seemed close to what I wanted. Excuse me if I duplicated this question.
Here is the UNION, DISTINCT approaches I tried:
DISTINCT # ==> Gives me Repetitive values
(from exp in db.Table<tblExpenses >()
from cat in db.Table<tblCategory>()
select new { exp.Id, exp.CategoryId, exp.DateSpent, exp.Expense, exp.Place, cat.Name }).Distinct();
UNION # ==> Got an error while using UNION

I think union already does the distict when you join the two tables you can try somethin like
var query=(from c in db.tblExpenses select c).Concat(from c in
db.tblCategory select c).Distinct().ToList();

You will always get DISTINCT records, since you are selecting the tblExpenses.ID too. (Unless there are multiple categories with the same ID. But that of course would be really, really bad design.)
Remember, when making a JOIN in LINQ, both field names and data types should be the same. Is the field tblExpenses.CategoryID a nullable field?
If so, try this JOIN:
db.Table<tblExpenses>()
.Join(db.Table<tblCategory>(),
exp => new { exp.CategoryId },
cat => new { CategoryId = (int?)cat.ID },
(exp, cat) => new {
exp.Id,
exp.CategoryId,
exp.DateSpent,
exp.Expense,
exp.Place,
cat.Name
})
.Select(j => new {
j.Id,
j.CategoryId,
j.DateSpent,
j.Expense,
j.Place,
j.Name
});

You can try this queries:
A SELECT DISTINCT query like this:
SELECT DISTINCT Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;
limits the results to unique values in the output field. The query results are not updateable.
or
A SELECT DISTINCTROW query like this:
SELECT DISTINCTROW Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;<br/><br/>
looks at the entire underlying tables, not just the output fields, to find unique rows.
reference:http://www.fmsinc.com/microsoftaccess/query/distinct_vs_distinctrow/unique_values_records.asp

Structuring large SQL rowset(s) and consuming in .NET

Take a look at this psuedo schema (please note this is a simplification so please try not to comment too heavily on the "advisability" of the schema itself). Assume Indexes are inplace on the FKs.
TABLE Lookup (
Lookup_ID int not null PK
Name nvarchar(255) not null
)
TABLE Document (
Document_ID int not null PK
Previous_ID null FK REFERENCES Document(Document_ID)
)
TABLE Document_Lookup (
Document_ID int not null FK REFERENCES Document(Document_ID)
Lookup_ID int not null FK REFERENCES Lookup(Lookup_ID)
)
Volumes: Document, 4 Million rows of which 90% have a null Previous_ID field value; Lookup, 6000 rows, Average lookups attached to each document 20 giving Document_Lookup 80 Millions rows.
Now in a .NET Service have structure to represent a Lookup row like this:-
struct Lookup
{
public int ID;
public string Name;
public List<int> DocumentIDs;
}
and that lookup rows are stored in a Dictionary<int, Lookup> where the key is the lookup ID. An important point here is that this dictionary should contain entries where the Lookup is referenced by at least one document, i.e., the list DocumentIDs should have Count > 0.
My task is populate this dictionary efficiently. So the simple approach would be:-
SELECT dl.Lookup_ID, l.Name, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
This could then be used to populate a the dictionary fairly efficiently.
The Question: Does the underlying rowset delivery (TDS?) perform some optimization? It seems to me that queries that de-normalise data are very common hence the possiblity that field values don't change from one row to the next is high, hence it would make sense to optomise the stream by not sending field values that haven't changed. Does anyone know whether such an optomisation is in place? (Optomisation does not appear to exist).
What more sophisticated query could I use to eliminate the duplication (I'm think specifically of repeating the name value)? I've heard of such a thing a "nested rowset", can that sort of thing be generated? Would it be more performant? How would I access it in .NET?
I would perform two queries; one to populate the Lookup dictionary then a second to populate the ditionary lists. I would then add code to knock out the unused Lookup entires. However imagine I got my predictions wrong and Lookup ended up being 1 Million rows with only a quarter actually referenced by any document?

As long as the names are relatively short in practice, the optimisation may not be necessary.
The easiest optimisation is to split it into two queries, one to get the names, the other to get the Document_ID list. (can be in the other order if it makes it easier to populate your data structures).
Example:
/*First get the name of the Lookup*/
select distinct dl.Lookup_ID, l.Name
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
/*Now get the list of Document_IDs for each*/
SELECT dl.Lookup_ID, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
There are also various tricks you could use to massage these into a single table but I suggest these are not worthwile.
The heirarchical rowsets you are thinking of are the MSDASHAPE OLEDB provider. They can do what you are suggesting but would restrict you to using the OLEDB provider for SQL which may not be what you want.
Finally consider careful XML
For example:
select
l.lookup_ID as "#l",
l.name as "#n",
(
select dl.Document_ID as "node()", ' ' as "node()"
from Document_Lookup dl where dl.lookup_ID = l.lookup_ID for xml path(''), type
) as "*"
from Lookup l
where l.lookup_ID in (select dl.lookup_ID from Document_Lookup dl)
for xml path('dl')
returns:
<dl l="1" n="One">1 2 </dl>
<dl l="2" n="Two">2 </dl>

When you're asking about "nested rowsets" are you referring to using the DbDataReader.NextResult() method?
if your query has two "outputs" (two select statements which return a separate resultsets), you can loop through the first using DbDataReader.Next() and when that returns "false" then you can call DbDataReader.NextResult() and then use DbDataReader.Next() again to continue.
var reader = cmd.ExecuteReader();
while(reader.Read()){
// load data
}
if(reader.NextResult()){
while(reader.Read()){
// lookup record from first result
// load data from second result
}
}
I've done this frequently to reduce duplicate data in a similar situation and it works really well:
SELECT * FROM tableA WHERE [condition]
SELECT * FROM tableB WHERE EXISTS (SELECT * FROM tableA WHERE [condition] AND tableB.FK = tableA.PK)
Disclaimer: I haven't tried this with a resultset as large as you're describing.
The downside of this is you'll need a way to map the second resultset to the first, using a hashtable or order list.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq query without using Joins for two tables distinct value - c#

Considering you need only Join alternative to select distinct ,you can use inner query logic like below to write LINQ SELECT contractorid ,vendor name Where Contracterid in (Select distinct contractor id from table2) Here assumption is contractorId is primary key in table 1

Related

Linq Query for a particular scenario

Linq Multiple Joins

Get some value based on the max date (query between two tables)

Linq Query to Get Distinct Records from Two Tables

Structuring large SQL rowset(s) and consuming in .NET

Categories

Resources