Take a look at this psuedo schema (please note this is a simplification so please try not to comment too heavily on the "advisability" of the schema itself). Assume Indexes are inplace on the FKs.
TABLE Lookup (
Lookup_ID int not null PK
Name nvarchar(255) not null
)
TABLE Document (
Document_ID int not null PK
Previous_ID null FK REFERENCES Document(Document_ID)
)
TABLE Document_Lookup (
Document_ID int not null FK REFERENCES Document(Document_ID)
Lookup_ID int not null FK REFERENCES Lookup(Lookup_ID)
)
Volumes: Document, 4 Million rows of which 90% have a null Previous_ID field value; Lookup, 6000 rows, Average lookups attached to each document 20 giving Document_Lookup 80 Millions rows.
Now in a .NET Service have structure to represent a Lookup row like this:-
struct Lookup
{
public int ID;
public string Name;
public List<int> DocumentIDs;
}
and that lookup rows are stored in a Dictionary<int, Lookup> where the key is the lookup ID. An important point here is that this dictionary should contain entries where the Lookup is referenced by at least one document, i.e., the list DocumentIDs should have Count > 0.
My task is populate this dictionary efficiently. So the simple approach would be:-
SELECT dl.Lookup_ID, l.Name, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
This could then be used to populate a the dictionary fairly efficiently.
The Question: Does the underlying rowset delivery (TDS?) perform some optimization? It seems to me that queries that de-normalise data are very common hence the possiblity that field values don't change from one row to the next is high, hence it would make sense to optomise the stream by not sending field values that haven't changed. Does anyone know whether such an optomisation is in place? (Optomisation does not appear to exist).
What more sophisticated query could I use to eliminate the duplication (I'm think specifically of repeating the name value)? I've heard of such a thing a "nested rowset", can that sort of thing be generated? Would it be more performant? How would I access it in .NET?
I would perform two queries; one to populate the Lookup dictionary then a second to populate the ditionary lists. I would then add code to knock out the unused Lookup entires. However imagine I got my predictions wrong and Lookup ended up being 1 Million rows with only a quarter actually referenced by any document?
As long as the names are relatively short in practice, the optimisation may not be necessary.
The easiest optimisation is to split it into two queries, one to get the names, the other to get the Document_ID list. (can be in the other order if it makes it easier to populate your data structures).
Example:
/*First get the name of the Lookup*/
select distinct dl.Lookup_ID, l.Name
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
/*Now get the list of Document_IDs for each*/
SELECT dl.Lookup_ID, dl.Document_ID
FROM Document_Lookup dl
INNER JOIN Lookup l ON l.Lookup_ID = dl.Lookup_ID
INNER JOIN Document d ON d.Document_ID = dl.Lookup_ID
WHERE d.Previous_ID IS NULL
ORDER BY dl.Lookup_ID, dl.Document_ID
There are also various tricks you could use to massage these into a single table but I suggest these are not worthwile.
The heirarchical rowsets you are thinking of are the MSDASHAPE OLEDB provider. They can do what you are suggesting but would restrict you to using the OLEDB provider for SQL which may not be what you want.
Finally consider careful XML
For example:
select
l.lookup_ID as "#l",
l.name as "#n",
(
select dl.Document_ID as "node()", ' ' as "node()"
from Document_Lookup dl where dl.lookup_ID = l.lookup_ID for xml path(''), type
) as "*"
from Lookup l
where l.lookup_ID in (select dl.lookup_ID from Document_Lookup dl)
for xml path('dl')
returns:
<dl l="1" n="One">1 2 </dl>
<dl l="2" n="Two">2 </dl>
When you're asking about "nested rowsets" are you referring to using the DbDataReader.NextResult() method?
if your query has two "outputs" (two select statements which return a separate resultsets), you can loop through the first using DbDataReader.Next() and when that returns "false" then you can call DbDataReader.NextResult() and then use DbDataReader.Next() again to continue.
var reader = cmd.ExecuteReader();
while(reader.Read()){
// load data
}
if(reader.NextResult()){
while(reader.Read()){
// lookup record from first result
// load data from second result
}
}
I've done this frequently to reduce duplicate data in a similar situation and it works really well:
SELECT * FROM tableA WHERE [condition]
SELECT * FROM tableB WHERE EXISTS (SELECT * FROM tableA WHERE [condition] AND tableB.FK = tableA.PK)
Disclaimer: I haven't tried this with a resultset as large as you're describing.
The downside of this is you'll need a way to map the second resultset to the first, using a hashtable or order list.
Related
Class Demo
{
public int Id{get;set;};
public string Name{get;set;}
public int Parent{get;set;};
public IList<Demo> children{get;set;}
}
Now my code returns List demos. The data is three level nested. I mean List can
contain again Children1 based on each upper level Id matching nested level Parent. It goes another level down where the upper Children1 each ID matches nested Parent and another List of Demos. How can I write an optimized query from it. *
List Demos will have huge data then each Demo Id matches with Parent if matches List of Children (suppose children1) is fetched and then based on the each child in Children 1 which matches Id with Parent another List of Children is filled.
Since you are saying Linq To SQL, I assume this has a backing table with a self join which is already an optimized way of defining such structures. If you generate your Linq To SQL model from such a table then your model would already have navigational properties for 'parent' and 'children'. SQL Server sample database's Employees table is a good example for this. Based on its model you would have something like:
var e = Employees.Select(em => new {
em.EmployeeID,
em.FirstName,
em.LastName,
em.ReportsToChildren
});
and that would generate this SQL:
SELECT [t0].[EmployeeID], [t0].[FirstName], [t0].[LastName], [t1].[EmployeeID] AS [EmployeeID2], [t1].[LastName] AS [LastName2], [t1].[FirstName] AS [FirstName2], [t1].[Title], [t1].[TitleOfCourtesy], [t1].[BirthDate], [t1].[HireDate], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[HomePhone], [t1].[Extension], [t1].[Photo], [t1].[Notes], [t1].[ReportsTo], [t1].[PhotoPath], (
SELECT COUNT(*)
FROM [Employees] AS [t2]
WHERE [t2].[ReportsTo] = [t0].[EmployeeID]
) AS [value]
FROM [Employees] AS [t0]
LEFT OUTER JOIN [Employees] AS [t1] ON [t1].[ReportsTo] = [t0].[EmployeeID]
ORDER BY [t0].[EmployeeID], [t1].[EmployeeID]
Using ToList() on this is a slight detail for enumeration.
Note: Using a utility like LinqPad, you can test this quickly and easily.
I have data in two tables and I need in one query get all data and join getting data.
SELECT
kpip.PersonalName,
kpiT.Name,
kpiPR.KpiTarget,
kpiPR.KpiResultDate,
kpiPR.KpiResult
FROM KpiPersonalResult AS kpiPR join KpiPersonal as kpip
on kpiPR.KpiPersonal = kpip.Id join KpiType AS kpiT
on kpip.KpiType = kpiT.Id join MerchantAdministrators as merA
on kpiPR.KpiAdded = merA.Id and kpiPR.KpiResultDate between '2021-04-07' and '2021-04-08'
select
kpiP.PersonalName,
kpiT.Name,
kpiP.KpiTarget
from KpiPersonal as kpiP join KpiType as kpiT
on kpiP.KpiType = kpiT.Id
Based on the fast that the second query has 3 columns of the same name as the first query, I guess you mean to union them:
SELECT
kpip.PersonalName,
kpiT.Name,
kpiPR.KpiTarget,
kpiPR.KpiResultDate,
kpiPR.KpiResult
FROM
KpiPersonalResult AS kpiPR
join KpiPersonal as kpip on kpiPR.KpiPersonal = kpip.Id
join KpiType AS kpiT on kpip.KpiType = kpiT.Id
join MerchantAdministrators as merA on kpiPR.KpiAdded = merA.Id and kpiPR.KpiResultDate between '2021-04-07' and '2021-04-08'
UNION ALL
select
kpiP.PersonalName,
kpiT.Name,
kpiP.KpiTarget,
null, --put suitable default values for the other columns here
null
from
KpiPersonal as kpiP
join KpiType as kpiT on kpiP.KpiType = kpiT.Id
Unioned queries need the same number of columns. I've inserted NULL as default value for the two missing columns in the second query (relative to the first)
UNION makes a resultset grow taller. If you intended for it to grow wider, that is done via JOIN. A simple pattern for doing so is:
WITH query1 AS(
--query 1 here
), query2 AS (
--query2 here
)
SELECT * FROM query1 JOIN query2 ON ...
Side note on formatting and indenting - most people find SQL most readable when all operations that are related are at the same indent level e.g in a typical query, the SELECT FROM WHERE GROUP ORDER keywords are all at the same indent level, with the blocks that relate to them (the list of selected columns, or list of joined tables, list of where'd predicates etc) indented a level again. We also typically don't use as when aliasing tables but we do use it when aliasing columns in the SELECT
I have some sql tables that I need to query information from my current query that returns a single column list is:
from f in FactSales
where f.DateKey == 20130921
where f.CompanyID <= 1
join item in DimMenuItems
on f.MenuItemKey equals item.MenuItemKey
join dmi in DimMenuItemDepts
on item.MenuItemDeptKey equals dmi.MenuItemDeptKey
group f by dmi.MenuItemDeptKey into c
select new {
Amount = c.Sum(l=>l.Amount)
}
This returns the data I want and it groups correctly by the third table I join but I cannot get the Description column from the dmi table. I have tried to add the field
Description = dmi.Description
but it doesnt work. How can I get data from the third table into the new select that I am creating with this statement? Many thanks for any help.
Firstly you are using Entity Framework COMPLETELY WRONG. Linq is NOT SQL.
You shouldn't be using join. Instead you should be using Associations.
So instead, your query should look like...
from sale in FactSales
where sale.DateKey == 20130921
where sale.CompanyID <= 1
group sale by sale.Item.Department into c
select new
{
Amount = c.Sum(l => l.Amount)
Department = c.Key
}
By following Associations, you will automatically be implicitly joining.
You should not be grouping by the id of the "table" but by the actual "row", or in Object parlance (which is what you should be using in EF, since the raison d'etre of an ORM is to convert DB to Object), is that you should be grouping by the "entity" rather than they the "entity's key".
EF already knows that the key is unique to the entity.
The grouping key word only allows you to access sale and sale.Item.Department after it. It is a transform, rather than an operator like in SQL.
I have two Tables - tblExpenses and tblCategories as follows
tblExpenses
ID (PK),
Place,
DateSpent,
CategoryID (FK)
tblCategory
ID (PK),
Name
I tried various LINQ approaches to get all distinct records from the above two tables but not with much success. I tried using UNION and DISTINCT but it didnt work.
The above two tables are defined in my Model section of my project which in turn will create tables in SQLite. I need to retrieve all the distinct records from both the tables to display values in gridview.
Kindly provide me some inputs to accomplish this task. I did some research to find answer to this question but nothing seemed close to what I wanted. Excuse me if I duplicated this question.
Here is the UNION, DISTINCT approaches I tried:
DISTINCT # ==> Gives me Repetitive values
(from exp in db.Table<tblExpenses >()
from cat in db.Table<tblCategory>()
select new { exp.Id, exp.CategoryId, exp.DateSpent, exp.Expense, exp.Place, cat.Name }).Distinct();
UNION # ==> Got an error while using UNION
I think union already does the distict when you join the two tables you can try somethin like
var query=(from c in db.tblExpenses select c).Concat(from c in
db.tblCategory select c).Distinct().ToList();
You will always get DISTINCT records, since you are selecting the tblExpenses.ID too. (Unless there are multiple categories with the same ID. But that of course would be really, really bad design.)
Remember, when making a JOIN in LINQ, both field names and data types should be the same. Is the field tblExpenses.CategoryID a nullable field?
If so, try this JOIN:
db.Table<tblExpenses>()
.Join(db.Table<tblCategory>(),
exp => new { exp.CategoryId },
cat => new { CategoryId = (int?)cat.ID },
(exp, cat) => new {
exp.Id,
exp.CategoryId,
exp.DateSpent,
exp.Expense,
exp.Place,
cat.Name
})
.Select(j => new {
j.Id,
j.CategoryId,
j.DateSpent,
j.Expense,
j.Place,
j.Name
});
You can try this queries:
A SELECT DISTINCT query like this:
SELECT DISTINCT Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;
limits the results to unique values in the output field. The query results are not updateable.
or
A SELECT DISTINCTROW query like this:
SELECT DISTINCTROW Name FROM tblCategory INNER JOIN tblExpenses ON tblCategory.categoryID = tblExpenses.categoryID;<br/><br/>
looks at the entire underlying tables, not just the output fields, to find unique rows.
reference:http://www.fmsinc.com/microsoftaccess/query/distinct_vs_distinctrow/unique_values_records.asp
I have 3 tables in my sql database like these :
Documents : (DocID, FileName) //list of all docs that were attached to items
Items : (ItemID, ...) //list of all items
DocumentRelation : (DocID, ItemID) //the relation between docs and items
In my winform application I have showed all records of Items table in a grid view and let user to select several rows of it and then if he press EditAll button another grid view should fill by file name of documents that are related to these selected items but not all of them,
Just each of documents which have relation with ALL selected items
Is there any query (sql or linq) to select these documents?
Try something like:
string query;
foreach (Item in SelectedItems)
{
query += "select DocID from DocumentRelation where ItemID =" + Item.Id;
query += "INTERSECT";
}
query -= "INTERSECT";
And exec the Query;
Take one string and keep on adding itemid comma separated in that,like 1,2,3 and then write query like
declare ItemID varchar(50);
set ItemID='1,2,3';
select FileName
from documents
Left Join DocumentRelation on Documents.DocId = DocumentRelation.DocId
where
DocumentRelation.ItemID in (select * from > dbo.SplitString(ItemID))
and then make one function in database like below
ALTER FUNCTION [dbo].[SplitString] (#OrderList varchar(1000))
RETURNS #ParsedList table (OrderID varchar(1000) )
AS BEGIN
IF #OrderList = ''
BEGIN
set #OrderList='Null'
end
DECLARE #OrderID varchar(1000), #Pos int
SET #OrderList = LTRIM(RTRIM(#OrderList))+ ','
SET #Pos = CHARINDEX(',', #OrderList, 1)
IF REPLACE(#OrderList, ',', '') <''
BEGIN
WHILE #Pos 0
BEGIN
SET #OrderID = LTRIM(RTRIM(LEFT(#OrderList, #Pos - 1)))
IF #OrderID < ''
BEGIN
INSERT INTO #ParsedList (OrderID)
VALUES (CAST(#OrderID AS varchar(1000)))
--Use Appropriate conversion
END
SET #OrderList = RIGHT(#OrderList, LEN(#OrderList) - #Pos)
SET #Pos = CHARINDEX(',', #OrderList, 1)
END
END
RETURN
END
Linq
var td =
from s in Items
join r in DocumentRelation on s.ItemID equals r.ItemID
join k in Documents on k.DocID equals r.DocID
where Coll.Contains (s.ItemID) //Here Coll is the collection of ItemID which you can store when the users click on the grid view row
select new
{
FileName=k.FileName,
DocumentID= k.DocId
};
You can loop through td collection and bind to your grid view
SQL
create a stored proc to get the relevant documents for the itemID selected from the grid view and paramterize your in clause
select k.FileName,k.DocId from Items as s inner join
DocumentRelation as r on
s.ItemID=r.ItemID and r.ItemId in (pass the above coll containing selected ItemIds as an input the SP)
inner join Documents as k
on k.DocId=r.DocIk
You can get the information on how to parametrize your sql query
Here's one approach. I'll let you figure out how you want to supply the list of items as arguments. And I also assume that (DocID, ItemID) is a primary key in the relations table. The having condition is what enforces your requirement that all select items are related to the list of documents you're seeking.
;with ItemsSelected as (
select i.ItemID
from Items as i
where i.ItemID in (<list of selected ItemIDs>)
)
select dr.DocID
from DocumentRelation as dr
where dr.ItemID in (select ItemID from ItemsSelected)
group by dr.DocID
having count(dr.ItemID) = (select count(*) from ItemsSelected);
EDIT
As far as I can tell, the accepted answer is equivalent to the solution here despite OP's comment below.
I did some quick tests with a very long series of intersect queries and confirmed that you can indeed expect that approach to become gradually slower with an increasing number of selected items. But a much worse problem was the time taken just to compile the queries. I tried this on a very fast server and found that that step took about eight seconds when roughly one hundred intersects were concatenated.
SQL Fiddle didn't let me do anywhere near as many before producing this error (and taking more than ten seconds in the process): The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
There are several possible methods of passing a list of arguments to SQL Server. Assuming that you prefer the dynamic query solution I'd argue that this version is still better while also noting that there is a SQL Server limit on the number of values inside the in.
There are plenty of ways to have this stuff blow up.