I am working with the Yelp data-set available online. I've been trying to optimize my query for days. For the schema I'll list below, I need to construct a query to provide the following:
Given a user's UID, display the most recent review information for each of the user's friends.
Here's the schema:
CREATE TABLE business(
bid varchar(40) PRIMARY KEY,
name varchar(100),
city varchar(40),
state char(2),
zip varchar(10),
latitude real,
longitude real,
address varchar(100),
numreviews INTEGER DEFAULT 0,
numcheckins INTEGER DEFAULT 0,
avgreview float DEFAULT 0,
isopen bool,
stars float
);
CREATE TABLE users(
uid varchar(40) PRIMARY KEY,
name varchar(40),
avgstars float,
fans INTEGER,
coolvotes INTEGER,
reviewcount INTEGER,
funnyvotes INTEGER,
signup varchar(20),
usefulvotes INTEGER,
latitude real,
longitude real
);
CREATE TABLE reviews(
rid varchar(40) PRIMARY KEY,
bid varchar(40),
uid varchar(40),
stars float,
date varchar(20),
funny INTEGER,
useful INTEGER,
cool INTEGER,
text varchar(1024),
FOREIGN KEY (uid) REFERENCES users(uid),
FOREIGN KEY (bid) REFERENCES business(bid)
);
CREATE TABLE friends(
uid varchar(40) REFERENCES users(uid),
fid varchar(40) REFERENCES users(uid)
);
Here's an example of the desired output:
For each of the user's friends, I display the following:
The friend's name
The name of the business from their most recent review
The city of the business from their most recent review
The text from their most recent review
Currently this is the only "solution" I've had success with.
Step 1: Get a list of all of the IDs for each of the user's friends.
SELECT fid from friends where uid = '{userId}'
This returns a list of all of the user ID's for each of the user's friends. So I basically have a friend ID list.
Step 2: With this information, I run a foreach loop in my program over that list. For each iteration of the friend ID list, I execute the below query and provide the temporary friend ID for the current iteration of the loop:
SELECT U.name, B.name, B.city, R.text, R.date FROM reviews as R, users as U, business as B
WHERE U.uid = '{currentFriendId}'
AND R.uid = '{currentFriendId}'
AND B.bid = R.bid
AND date = (SELECT MAX(date) FROM reviews WHERE uid = '{currentFriendId}')
For EACH time I run this for loop, I get a single line of output for what I desire, such as this:
This is great...except I have to run this query for every single one of the user's friends. This is extremely costly.
Goal: I'm trying to combine these 2 queries, or revamp them completely, to generate all of the rows at once in a single query.
Question: Given the information provided, how can I fix my queries to generate all of this information from a single query?
It looks like a top-n-per-group problem.
One way to do it is to use a lateral join.
Make sure you have an index on reviews table on (uid, date). A composite index. One index on two columns in this order.
Something like this:
CREATE INDEX IX_uid_date ON reviews (uid, date);
Query
SELECT
t.UserName
,t.BusinessName
,t.city
,t.text
,t.date
FROM
friends
INNER JOIN LATERAL
(
SELECT
users.name AS UserName
,business.name AS BusinessName
,business.city
,reviews.text
,reviews.date
FROM
reviews
INNER JOIN users ON users.uid = reviews.uid
INNER JOIN business ON business.bid = reviews.bid
WHERE
reviews.uid = friends.fid
ORDER BY reviews.date DESC
LIMIT 1
) AS t ON true
WHERE
friends.uid = '{userId}'
;
This should work fine.
SELECT name FROM employees as E
WHERE E.uid IN (SELECT uid FROM employees WHERE name = 'John')
You do not need to do an equal comparison as in single truth value match queries.
Following up on Manos' answer, not sure I understand why you need to limit each fid at all
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE X.uid = R.uid))
AND (R.uid IN (SELECT fid FROM friends));
If your issue is that your query only results in one row, you should remove that where uid = to get results for all uid's.
I have also arrived at an answer the same time roughly as Vladimir Baranov, but I will post my version as well. I don't promise it to be pretty:
SELECT R.name as user_name, B.name as business_name, B.City, R.text
FROM (SELECT bid, name, text
FROM (SELECT R.rid, R.bid, R.uid, R.text, max_date
FROM reviews as R INNER JOIN
(SELECT uid, MAX(date) as max_date FROM reviews WHERE uid IN (SELECT fid from friends where uid = 'BfcNxKpnF9z5wJLXY7elRg') GROUP BY uid) sub
ON R.uid = sub.uid AND R.date = sub.max_date) as review_info
INNER JOIN users
on review_info.uid = users.uid) as R
INNER JOIN business as B
ON R.bid = B.bid
After examining the schema you posted, I used MySQL to create the database and populate the tables with the following sample data:
INSERT INTO users (uid, name) VALUES
('user1', 'user1 name'),
('user2', 'user2 name'),
('user3', 'user3 name'),
('user4', 'user4 name'),
('user5', 'user5 name');
INSERT INTO friends (uid, fid) VALUES
('user1', 'user2'), ('user1', 'user3'),
('user2', 'user4'), ('user2', 'user5');
INSERT INTO business (bid, name, city) VALUES
('b1', 'business 1', 'city 1'),
('b2', 'business 2', 'city 2'),
('b3', 'business 3', 'city 3'),
('b4', 'business 4', 'city 4');
INSERT INTO reviews (rid, bid, uid, stars, date, text) VALUES
('r1', 'b1', 'user1', 5, '2019-05-01', 'blah'),
('r2', 'b2', 'user1', 5, '2019-05-02', 'blah'),
('r3', 'b3', 'user1', 5, '2019-05-03', 'blah'),
('r4', 'b1', 'user2', 4, '2019-05-11', 'blah'),
('r5', 'b2', 'user3', 3, '2019-05-12', 'blah'),
('r6', 'b1', 'user4', 5, '2019-05-13', 'blah');
This allowed me to verify that the original solution I proposed was correct by executing the query in MySQL Workbench.
I assume that the 'failure to finish' you mention has nothing to do with the query per se, but is rather a temporary failure of the DB connection api you use.
Note that the code is updated to incorporate Mihail Shishkov's proposal for using parameters.
-- Display review information originating from friends of user1
-- DECLARE #UID varchar(40); -- Uncomment for MS-SQL (variables need to be declared)
SET #UID = 'user1';
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE (X.uid = R.uid)))
AND (R.uid IN (SELECT F.fid FROM friends AS F WHERE (F.uid = #UID)));
Based on the sample data and using 'user1' as the value for the #UID parameter, the results of the query are:
name name city text date
------------------------------------------------
user2 name business 1 city 1 blah 2019-05-11
user3 name business 2 city 2 blah 2019-05-12
Moreover, I assume that friendship is a two-way relationship in the context of your schema (as in the real world), meaning that friendship between 'user1' and 'user2' only needs to be defined by a single record in table 'friends' with the values ('user1', 'user2') and the reverse ('user2', 'user1') is unnecessary.
So, for the sake of completeness, you can use the following query:
-- Display review information originating from friends of user2
SET #UID = 'user2';
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE (X.uid = R.uid)))
AND (R.uid IN (SELECT F.fid FROM friends AS F WHERE (F.uid = #UID) UNION
SELECT F.uid FROM friends AS F WHERE (F.fid = #UID)));
Now, using 'user2' as the value for the #UID parameter and the extended version of the query, we obtain the following results:
name name city text date
------------------------------------------------
user1 name business 3 city 3 blah 2019-05-03
user4 name business 1 city 1 blah 2019-05-13
I would appreciate it if you acknowledge the answer as acceptable.
Related
I want to retrieve data from two tables like below. I have a Products table which has P_id, P_name columns and a BATCH table with p_id_fk as a foreign key to the Products table.
This is my query; I want to retrieved from product's name from the Product table because I have stored the Products table primary key as a foreign in the Batch table.
SqlDataAdapter sda = new SqlDataAdapter("Select batch_id, quantity, left_qty, purchaseDate, manufacturing_date, expiryDate from batch where Convert(DATE, expiryDate, 103) BETWEEN #from AND #to", con);
sda.SelectCommand.Parameters.AddWithValue("#from", Convert.ToDateTime(datePicker1.SelectedDate.Value).ToString("yyyyMMdd"));
sda.SelectCommand.Parameters.AddWithValue("#to", Convert.ToDateTime(datePicker2.SelectedDate.Value).ToString("yyyyMMdd"));
If you want to retrieve data from two tables you need to use a SQL JOIN
I am not sure of the exact make up of your tables but something like the below
Select batch_id,
product_name,
quantity,
left_qty,
purchaseDate,
manufacturing_date,
expiryDate
from batch B
INNER JOIN Products P
ON P.P_id = B.P_id
where Convert(DATE,expiryDate,103) BETWEEN #from AND #to
you need to have a join or cross apply here.
Option 1 - inner join:
Select
b.batch_id,pd.product_name,quantity,left_qty,
purchaseDate,manufacturing_date,expiryDate from batch b
inner join product pd on pd.p_id = b.p_id where Convert(DATE,expiryDate,103)
BETWEEN #from AND #to
Option 2 cross apply:
Select
b.batch_id,pd.product_name,quantity,left_qty,
purchaseDate,manufacturing_date,expiryDate from batch b
cross apply
(
select product_name from product p
where p.p_id = b.p_id
) pd
where Convert(DATE,expiryDate,103)
BETWEEN #from AND #to
for more about cross apply look here.
Not sure if I understood your question correctly, but I believe for your query you are looking for something simple as JOIN between Products and Batch tables:
SELECT
P.P_id,
P.P_name,
B.batch_id,
B.product_name,
B.quantity,
B.left_qty,
B.purchaseDate,
B.manufacturing_date,
B.expiryDate
FROM Batch AS B
INNER JOIN Products AS P
ON B.p_id_fk = P.P_id
WHERE CONVERT(DATE, B.expiryDate, 103) BETWEEN #from AND #to
p_id_fk name you provided might be not an actual column name in Batch table but rather the name of the foreign key constraint itself as it appears by the naming convention (_fk suffix).
I have the following SQL Table:
Name Description Id UserId CreatedDate
UserSet1 Desc1 1 Abc 06/01/2018
UserSet1 Desc2 2 Def 06/02/2018
UserSet2 Desc for 2 5 NewUser 06/04/2018
UserSet2 Desc for 2 7 NewUser 06/19/2018
What I want to extract from the above table is just the latest Id for each Name so that I could get the following output
Name Description Id UserId CreatedDate
UserSet1 Desc2 2 Def 06/01/2018
UserSet2 Desc for 2 7 NewUser 06/19/2018
Since Id 2 & 7 are the latest entries in the table for UserSet1 & UserSet2, I would like to display that instead of all the entries in the table.
Any inputs how can I get the desired result.
I am open for solutions directly returning the output or any linq (C#) solutions as well. Ie returning the entire dataset and then using linq to filter the above.
EDIT: Since you are looking for the highest number ID, the GROUP BY method would probably be easier to work with.
Using a window function:
SELECT *
FROM (
SELECT Name, Description, Id, UserId, CreatedDate
, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY CreatedDate DESC) AS rn
FROM myTable
) s1
WHERE rn = 1
I don't have an instance of dynamoDB to test on, but I believe it can use the ROW_NUMBER() window function.
Thanks everyone for pointing to right direction. I have got this working with the below code of Linq and C#:
var results = response.GroupBy(row => row.Name)
.SelectMany(g => g.OrderByDescending(row => row.Id).Take(1));
For the initial tests this seems to be working. Let me know if you think this has come issues.
This should be a general SQL answer:
SELECT * FROM yourtable Y1
WHERE Id = (SELECT MAX(Id)
FROM yourtable Y2
WHERE Y2.Name = Y1.Name)
If it was MS SQL you could use Partition By command, otherwise most performant way would be:
select * from Table
where Id in (
select Max(Id) from Table
Group By Name
)
not sure if you can leave Name out of the Select statement, you might need to do:
select * from Table
where Id in (
Select Id from
(select Name, Max(Id) as Id from Table
Group By Name)
)
I'm stuck on a task where I have to transform a Stored Procedure into a LINQ query.
The model:
AccountSet: Account table with columns 'AccountId', 'ParentAccountId' (references an 'AccountId') and 'Name'
ContactSet: Contact table with columns 'ParentCustomerId'
(references an Account via 'AccountId')
The Stored Procedure:
It should search for all accounts with the given id
Search all parents (recursive) for the accounts found in step 1
Fetch all contacts that have a ParentCustomerId matching an 'AccountId' found in step 2
CREATE PROCEDURE [dbo].[sp_GetContactsForCompany]
(
#projectid AS UNIQUEIDENTIFIER
)
AS WITH recursion ( AccountId, Name, ParentAccountId )
AS (
SELECT AccountId, Name, ParentAccountId
FROM dbo.AccountBase
WHERE AccountId = #projectid
UNION ALL
SELECT a.AccountId, a.Name, a.ParentAccountId
FROM dbo.AccountBase AS a
INNER JOIN recursion AS b ON a.ParentAccountId = b.AccountId
)
SELECT ContactId, FullName
FROM dbo.ContactBase
WHERE ParentCustomerId IN (
SELECT AccountId
FROM recursion
)
ORDER BY FullName
LINQ:
from a in allAccs
where a.AccountId == id
select a;
This gives me all the accounts with the given id. But now I have no idea how to apply the join and recursion.
Any hint would be great.
LAYOUT:
I have a Subscriber database with Subscriber info in a table, all with unique AccountID's.
I have multiple History databases with a History table in each, all pertaining to the AccountID's in the Subscriber database.
I NEED:
I need a list of the most recent History record entered, in any of the History databases, for each AccountID in the Subscriber data. 1 record per AccountID.
I can achieve this with multiple hits to the database, but there are potentially millions of records and that doesn't sit well in my head. I want to make this happen in one hit.
Help. Me. Thanks.
Here's something I have tried already, but it doesn't give me a single record per AccountID...
SELECT
MAIN.*,
ISNULL(SubData.Name, '') AS [Name],
ISNULL(SubData.AcctLineCode, '') AS AcctLineCode,
ISNULL(LTRIM(RTRIM(SubData.AcctNum)), '') AS AcctNum
FROM
(
SELECT AccountID, AlarmDate, AlarmCode FROM [History1113]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
UNION
SELECT AccountID, AlarmDate, AlarmCode FROM [History1013]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
UNION
SELECT AccountID, AlarmDate, AlarmCode FROM [History0913]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
)
AS MAIN
LEFT JOIN Subscriber..[Subscriber Data] AS SubData ON Main.AccountID = SubData.AccountID
ORDER BY AccountID, AlarmDate DESC
I'd do it as a view. Biggest issue will be making sure the view can see all the history tables if they are in seperate databases. You may have to get into linked servers
Create view historytable
as
select * from historytable1
union all
select * from historytable2
union all
etc...
Now query from historytable as if it was a table with all rows in it.
Edit:
the statement you've added has no aggregates, so it has no method of filtering down (or grouping by) into one record.
To your reply:
Lets call my view above main so I don't have to type so much.
Select account_id, max(alarm_date) as maxdate from main group by account_id
This simple select brings back to most recent record. Inner join it so it functions as a filter.
select ...
from main
inner join (Select account_id, max(alarm_date) as maxdate from main group by account_id) maxdate
on main.account_id = maxdate.account_ID and maxdate.maxdate = main.alarm_date
Add your subscriber join to the bottom of that and fill in the columns you need
With a little help from a couple of you, I was able to figure this out. So, thank you all.
Here's a code snippet of how I got it to work. I still need to do some joins to bring in account info, but this was the hard part.
`
SELECT MAIN.AccountID, MAX(MAIN.AlarmDate) AS AlarmDate FROM
(
SELECT AccountID, MAX(AlarmDate) AS AlarmDate FROM [History1113]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
GROUP BY AccountID
UNION
SELECT AccountID, MAX(AlarmDate) AS AlarmDate FROM [History1013]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
GROUP BY AccountID
UNION
SELECT AccountID, MAX(AlarmDate) AS AlarmDate FROM [History0913]..SignalHistory WHERE AccountID IN (SELECT DISTINCT AccountID FROM Subscriber..[Subscriber Data])
GROUP BY AccountID
)
AS MAIN
GROUP BY MAIN.AccountID
`
i am trying to show the last order for the a specific customer on a grid view , what i did is showing all orders for the customer but i need the last order
here is my SQL code
SELECT orders.order_id, orders.order_date,
orders.payment_type, orders.cardnumber, packages.Package_name,
orders.package_id, packages.package_price
FROM orders INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
#username get its value from a cookie , now how can i choose the last order only for a cookie value " Tony " for example ?
To generalize (and fix a little bit) Mitch's answer, you need to use SELECT clause embellished with TOP(#N) and ORDER BY ... DESC. Note that I use TOP(#N), not TOP N, which means you can pass it as an argument to the stored procedure and return, say, not 1 but N last orders:
CREATE STORED PROCEDURE ...
#N int
...
SELECT TOP(#N) ...
ORDER BY ... DESC
SELECT top 1
orders.order_id,
orders.order_date,
orders.payment_type,
orders.cardnumber,
packages.Package_name,
orders.package_id,
packages.package_price
FROM orders
INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
ORDER BY orders.order_date DESC
In fact assuming orders.order_id is an Identity column:
SELECT top 1
orders.order_id,
orders.order_date,
orders.payment_type,
orders.cardnumber,
packages.Package_name,
orders.package_id,
packages.package_price
FROM orders
INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
ORDER BY orders.order_id DESC