I am working with the Yelp data-set available online. I've been trying to optimize my query for days. For the schema I'll list below, I need to construct a query to provide the following:
Given a user's UID, display the most recent review information for each of the user's friends.
Here's the schema:
CREATE TABLE business(
bid varchar(40) PRIMARY KEY,
name varchar(100),
city varchar(40),
state char(2),
zip varchar(10),
latitude real,
longitude real,
address varchar(100),
numreviews INTEGER DEFAULT 0,
numcheckins INTEGER DEFAULT 0,
avgreview float DEFAULT 0,
isopen bool,
stars float
);
CREATE TABLE users(
uid varchar(40) PRIMARY KEY,
name varchar(40),
avgstars float,
fans INTEGER,
coolvotes INTEGER,
reviewcount INTEGER,
funnyvotes INTEGER,
signup varchar(20),
usefulvotes INTEGER,
latitude real,
longitude real
);
CREATE TABLE reviews(
rid varchar(40) PRIMARY KEY,
bid varchar(40),
uid varchar(40),
stars float,
date varchar(20),
funny INTEGER,
useful INTEGER,
cool INTEGER,
text varchar(1024),
FOREIGN KEY (uid) REFERENCES users(uid),
FOREIGN KEY (bid) REFERENCES business(bid)
);
CREATE TABLE friends(
uid varchar(40) REFERENCES users(uid),
fid varchar(40) REFERENCES users(uid)
);
Here's an example of the desired output:
For each of the user's friends, I display the following:
The friend's name
The name of the business from their most recent review
The city of the business from their most recent review
The text from their most recent review
Currently this is the only "solution" I've had success with.
Step 1: Get a list of all of the IDs for each of the user's friends.
SELECT fid from friends where uid = '{userId}'
This returns a list of all of the user ID's for each of the user's friends. So I basically have a friend ID list.
Step 2: With this information, I run a foreach loop in my program over that list. For each iteration of the friend ID list, I execute the below query and provide the temporary friend ID for the current iteration of the loop:
SELECT U.name, B.name, B.city, R.text, R.date FROM reviews as R, users as U, business as B
WHERE U.uid = '{currentFriendId}'
AND R.uid = '{currentFriendId}'
AND B.bid = R.bid
AND date = (SELECT MAX(date) FROM reviews WHERE uid = '{currentFriendId}')
For EACH time I run this for loop, I get a single line of output for what I desire, such as this:
This is great...except I have to run this query for every single one of the user's friends. This is extremely costly.
Goal: I'm trying to combine these 2 queries, or revamp them completely, to generate all of the rows at once in a single query.
Question: Given the information provided, how can I fix my queries to generate all of this information from a single query?
It looks like a top-n-per-group problem.
One way to do it is to use a lateral join.
Make sure you have an index on reviews table on (uid, date). A composite index. One index on two columns in this order.
Something like this:
CREATE INDEX IX_uid_date ON reviews (uid, date);
Query
SELECT
t.UserName
,t.BusinessName
,t.city
,t.text
,t.date
FROM
friends
INNER JOIN LATERAL
(
SELECT
users.name AS UserName
,business.name AS BusinessName
,business.city
,reviews.text
,reviews.date
FROM
reviews
INNER JOIN users ON users.uid = reviews.uid
INNER JOIN business ON business.bid = reviews.bid
WHERE
reviews.uid = friends.fid
ORDER BY reviews.date DESC
LIMIT 1
) AS t ON true
WHERE
friends.uid = '{userId}'
;
This should work fine.
SELECT name FROM employees as E
WHERE E.uid IN (SELECT uid FROM employees WHERE name = 'John')
You do not need to do an equal comparison as in single truth value match queries.
Following up on Manos' answer, not sure I understand why you need to limit each fid at all
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE X.uid = R.uid))
AND (R.uid IN (SELECT fid FROM friends));
If your issue is that your query only results in one row, you should remove that where uid = to get results for all uid's.
I have also arrived at an answer the same time roughly as Vladimir Baranov, but I will post my version as well. I don't promise it to be pretty:
SELECT R.name as user_name, B.name as business_name, B.City, R.text
FROM (SELECT bid, name, text
FROM (SELECT R.rid, R.bid, R.uid, R.text, max_date
FROM reviews as R INNER JOIN
(SELECT uid, MAX(date) as max_date FROM reviews WHERE uid IN (SELECT fid from friends where uid = 'BfcNxKpnF9z5wJLXY7elRg') GROUP BY uid) sub
ON R.uid = sub.uid AND R.date = sub.max_date) as review_info
INNER JOIN users
on review_info.uid = users.uid) as R
INNER JOIN business as B
ON R.bid = B.bid
After examining the schema you posted, I used MySQL to create the database and populate the tables with the following sample data:
INSERT INTO users (uid, name) VALUES
('user1', 'user1 name'),
('user2', 'user2 name'),
('user3', 'user3 name'),
('user4', 'user4 name'),
('user5', 'user5 name');
INSERT INTO friends (uid, fid) VALUES
('user1', 'user2'), ('user1', 'user3'),
('user2', 'user4'), ('user2', 'user5');
INSERT INTO business (bid, name, city) VALUES
('b1', 'business 1', 'city 1'),
('b2', 'business 2', 'city 2'),
('b3', 'business 3', 'city 3'),
('b4', 'business 4', 'city 4');
INSERT INTO reviews (rid, bid, uid, stars, date, text) VALUES
('r1', 'b1', 'user1', 5, '2019-05-01', 'blah'),
('r2', 'b2', 'user1', 5, '2019-05-02', 'blah'),
('r3', 'b3', 'user1', 5, '2019-05-03', 'blah'),
('r4', 'b1', 'user2', 4, '2019-05-11', 'blah'),
('r5', 'b2', 'user3', 3, '2019-05-12', 'blah'),
('r6', 'b1', 'user4', 5, '2019-05-13', 'blah');
This allowed me to verify that the original solution I proposed was correct by executing the query in MySQL Workbench.
I assume that the 'failure to finish' you mention has nothing to do with the query per se, but is rather a temporary failure of the DB connection api you use.
Note that the code is updated to incorporate Mihail Shishkov's proposal for using parameters.
-- Display review information originating from friends of user1
-- DECLARE #UID varchar(40); -- Uncomment for MS-SQL (variables need to be declared)
SET #UID = 'user1';
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE (X.uid = R.uid)))
AND (R.uid IN (SELECT F.fid FROM friends AS F WHERE (F.uid = #UID)));
Based on the sample data and using 'user1' as the value for the #UID parameter, the results of the query are:
name name city text date
------------------------------------------------
user2 name business 1 city 1 blah 2019-05-11
user3 name business 2 city 2 blah 2019-05-12
Moreover, I assume that friendship is a two-way relationship in the context of your schema (as in the real world), meaning that friendship between 'user1' and 'user2' only needs to be defined by a single record in table 'friends' with the values ('user1', 'user2') and the reverse ('user2', 'user1') is unnecessary.
So, for the sake of completeness, you can use the following query:
-- Display review information originating from friends of user2
SET #UID = 'user2';
SELECT U.name, B.name, B.city, R.text, R.date
FROM business AS B
INNER JOIN reviews AS R ON B.bid = R.bid
INNER JOIN users AS U ON R.uid = U.uid
WHERE (R.date = (SELECT MAX(X.date) FROM reviews AS X WHERE (X.uid = R.uid)))
AND (R.uid IN (SELECT F.fid FROM friends AS F WHERE (F.uid = #UID) UNION
SELECT F.uid FROM friends AS F WHERE (F.fid = #UID)));
Now, using 'user2' as the value for the #UID parameter and the extended version of the query, we obtain the following results:
name name city text date
------------------------------------------------
user1 name business 3 city 3 blah 2019-05-03
user4 name business 1 city 1 blah 2019-05-13
I would appreciate it if you acknowledge the answer as acceptable.
I am trying to select customers and their orders in one query, but I get customer and his orders in datatable which customer table columns repeated for each order.
I tried DISTINCT, GROUP BY but can't do it.
SQL:
select *
from Customer, Order
where Order.CustomerID = Customer.CustomerID
and Customer.CustomerID = '2'
Tables:
Since there cannot be different columns for each row you can't do it without having duplicates. Consider reading data separately, once for the customer and once for her orders.
i want to get all customers and orders the query count will grow.if i
have 3 customer i want to get orders and customers in one query.not 6
times query execution.
You do not need to perfrom a separate query for each customer. You just need a single query for all customers and a single query for all orders. Then you may connect them in application layer rather than a single query.
But if you argue that you have too many customers and too many orders to hold them all in memory, well, then you may perform a separate query for each customer. That's a tradeoff between memory and CPU.
This is a very rare query, but this my understanding of your need :p.
select *
from (
select 'CustomerID' as col1, 'CustomerName' as col2, 'ContactName' as col3,
'Address' as col4, 'City' as col5, 'PostalCode' as col6, 'Country' as col7, 0 as ord
union all
select CustomerID, CustomerName, ContactName, Address, City, PostalCode, Country, 1 as ord
from Customers
union all
select 'OrderId', 'CustomerID', 'EmployeeID', 'OrderDate', 'ShipperID', Null, Null, 0 as ord
union all
select OrderId, CustomerID, EmployeeID, OrderDate, ShipperID, Null, Null, 2 as ord
from Orders) res
In the result with ord = 0 you have titles, with ord = 1 you will have customers only and with ord = 2 you will have orders, and you can use this query with this condition:
where (col1 = #customerId and ord = 1) or (col2 = #customerId and ord = 2)
You can add or ord =0 if you want to add titles in your output.
i am trying to show the last order for the a specific customer on a grid view , what i did is showing all orders for the customer but i need the last order
here is my SQL code
SELECT orders.order_id, orders.order_date,
orders.payment_type, orders.cardnumber, packages.Package_name,
orders.package_id, packages.package_price
FROM orders INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
#username get its value from a cookie , now how can i choose the last order only for a cookie value " Tony " for example ?
To generalize (and fix a little bit) Mitch's answer, you need to use SELECT clause embellished with TOP(#N) and ORDER BY ... DESC. Note that I use TOP(#N), not TOP N, which means you can pass it as an argument to the stored procedure and return, say, not 1 but N last orders:
CREATE STORED PROCEDURE ...
#N int
...
SELECT TOP(#N) ...
ORDER BY ... DESC
SELECT top 1
orders.order_id,
orders.order_date,
orders.payment_type,
orders.cardnumber,
packages.Package_name,
orders.package_id,
packages.package_price
FROM orders
INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
ORDER BY orders.order_date DESC
In fact assuming orders.order_id is an Identity column:
SELECT top 1
orders.order_id,
orders.order_date,
orders.payment_type,
orders.cardnumber,
packages.Package_name,
orders.package_id,
packages.package_price
FROM orders
INNER JOIN packages ON orders.package_id = packages.Package_ID
WHERE (orders.username = #username )
ORDER BY orders.order_id DESC