Advanced Queries and Joins

Learn to combine data from multiple tables with joins, subqueries, and window functions

TLDR: Use INNER JOIN to combine related rows from two tables. LEFT JOIN includes all rows from the first table even if there's no match. Subqueries let you nest queries. Window functions perform calculations across rows without grouping. CTEs make complex queries readable.

Most real queries need data from multiple tables. Joins combine tables based on relationships.

Understanding Joins

Given two tables:

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  username VARCHAR(50)
);

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  title VARCHAR(200)
);

INSERT INTO users (id, username) VALUES (1, 'alice'), (2, 'bob'), (3, 'carol');
INSERT INTO posts (user_id, title) VALUES
  (1, 'Alice First Post'),
  (1, 'Alice Second Post'),
  (2, 'Bob Post');

Users:

 id | username
----+----------
  1 | alice
  2 | bob
  3 | carol

Posts:

 id | user_id | title
----+---------+-------------------
  1 |       1 | Alice First Post
  2 |       1 | Alice Second Post
  3 |       2 | Bob Post

INNER JOIN

Returns only matching rows:

SELECT users.username, posts.title
FROM users
INNER JOIN posts ON users.id = posts.user_id;

Result:

 username |       title
----------+-------------------
 alice    | Alice First Post
 alice    | Alice Second Post
 bob      | Bob Post

Carol doesn't appear - she has no posts.

Table aliases make queries shorter:

SELECT u.username, p.title
FROM users u
INNER JOIN posts p ON u.id = p.user_id;

LEFT JOIN

Returns all rows from the left table, with NULLs for non-matches:

SELECT u.username, p.title
FROM users u
LEFT JOIN posts p ON u.id = p.user_id;

Result:

 username |       title
----------+-------------------
 alice    | Alice First Post
 alice    | Alice Second Post
 bob      | Bob Post
 carol    | NULL

Carol appears even though she has no posts.

Find users with no posts:

SELECT u.username
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE p.id IS NULL;

RIGHT JOIN

Opposite of LEFT JOIN - all rows from right table:

SELECT u.username, p.title
FROM posts p
RIGHT JOIN users u ON u.id = p.user_id;

Same result as LEFT JOIN with tables swapped. Most people prefer LEFT JOIN for consistency.

FULL OUTER JOIN

All rows from both tables:

SELECT u.username, p.title
FROM users u
FULL OUTER JOIN posts p ON u.id = p.user_id;

Rarely needed, but useful for finding unmatched rows in either table.

Multiple Joins

Join more than two tables:

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  post_id INTEGER REFERENCES posts(id),
  user_id INTEGER REFERENCES users(id),
  content TEXT
);

INSERT INTO comments (post_id, user_id, content) VALUES
  (1, 2, 'Great post!'),
  (1, 3, 'Thanks for sharing');

SELECT
  p.title AS post_title,
  u.username AS author,
  c.content AS comment,
  cu.username AS commenter
FROM posts p
JOIN users u ON p.user_id = u.id
JOIN comments c ON c.post_id = p.id
JOIN users cu ON c.user_id = cu.id;

Result:

    post_title    | author | comment           | commenter
------------------+--------+-------------------+-----------
 Alice First Post | alice  | Great post!       | bob
 Alice First Post | alice  | Thanks for sharing| carol

Self Joins

Join a table to itself:

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  manager_id INTEGER REFERENCES employees(id)
);

INSERT INTO employees (name, manager_id) VALUES
  ('CEO', NULL),
  ('VP Sales', 1),
  ('VP Engineering', 1),
  ('Sales Rep', 2),
  ('Developer', 3);

-- Find employees and their managers
SELECT
  e.name AS employee,
  m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Result:

    employee     | manager
-----------------+---------
 CEO             | NULL
 VP Sales        | CEO
 VP Engineering  | CEO
 Sales Rep       | VP Sales
 Developer       | VP Engineering

Subqueries

Queries within queries.

Subquery in WHERE

-- Posts by users with more than 5 posts
SELECT title
FROM posts
WHERE user_id IN (
  SELECT user_id
  FROM posts
  GROUP BY user_id
  HAVING COUNT(*) > 5
);

Subquery in SELECT

SELECT
  u.username,
  (SELECT COUNT(*) FROM posts p WHERE p.user_id = u.id) AS post_count
FROM users u;

Result:

 username | post_count
----------+------------
 alice    |          2
 bob      |          1
 carol    |          0

Subquery in FROM

SELECT avg_posts.username, avg_posts.post_count
FROM (
  SELECT u.username, COUNT(p.id) AS post_count
  FROM users u
  LEFT JOIN posts p ON u.id = p.user_id
  GROUP BY u.username
) AS avg_posts
WHERE avg_posts.post_count > 0;

Common Table Expressions (CTEs)

CTEs make complex queries readable. Use WITH:

WITH user_post_counts AS (
  SELECT u.id, u.username, COUNT(p.id) AS post_count
  FROM users u
  LEFT JOIN posts p ON u.id = p.user_id
  GROUP BY u.id, u.username
)
SELECT * FROM user_post_counts WHERE post_count > 1;

Multiple CTEs:

WITH active_users AS (
  SELECT * FROM users WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
),
recent_posts AS (
  SELECT * FROM posts WHERE created_at > CURRENT_DATE - INTERVAL '7 days'
)
SELECT u.username, COUNT(p.id) AS recent_post_count
FROM active_users u
LEFT JOIN recent_posts p ON u.id = p.user_id
GROUP BY u.username;

Window Functions

Perform calculations across related rows without GROUP BY.

ROW_NUMBER

SELECT
  username,
  created_at,
  ROW_NUMBER() OVER (ORDER BY created_at) AS signup_order
FROM users;

Result:

 username | created_at          | signup_order
----------+---------------------+--------------
 alice    | 2024-01-01 10:00:00 | 1
 bob      | 2024-01-02 11:00:00 | 2
 carol    | 2024-01-03 12:00:00 | 3

RANK and DENSE_RANK

SELECT
  username,
  post_count,
  RANK() OVER (ORDER BY post_count DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY post_count DESC) AS dense_rank
FROM (
  SELECT u.username, COUNT(p.id) AS post_count
  FROM users u
  LEFT JOIN posts p ON u.id = p.user_id
  GROUP BY u.username
) user_posts;

PARTITION BY

Calculate within groups:

SELECT
  p.title,
  u.username,
  p.views,
  AVG(p.views) OVER (PARTITION BY u.id) AS user_avg_views
FROM posts p
JOIN users u ON p.user_id = u.id;

Shows each post's views and the user's average views across all their posts.

LAG and LEAD

Access previous/next row values:

SELECT
  title,
  created_at,
  LAG(created_at) OVER (ORDER BY created_at) AS prev_post_date,
  LEAD(created_at) OVER (ORDER BY created_at) AS next_post_date
FROM posts;

Useful for calculating time between events.

UNION and UNION ALL

Combine results from multiple queries:

SELECT username AS name, 'user' AS type FROM users
UNION ALL
SELECT title AS name, 'post' AS type FROM posts;

UNION removes duplicates, UNION ALL keeps them (faster).

Practical Examples

Find top authors:

SELECT
  u.username,
  COUNT(p.id) AS post_count,
  SUM(p.views) AS total_views
FROM users u
JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.username
ORDER BY total_views DESC
LIMIT 10;

Posts with most comments:

SELECT
  p.title,
  u.username AS author,
  COUNT(c.id) AS comment_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN comments c ON c.post_id = p.id
GROUP BY p.id, p.title, u.username
ORDER BY comment_count DESC
LIMIT 5;

User activity summary:

WITH user_stats AS (
  SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) AS posts,
    COUNT(DISTINCT c.id) AS comments
  FROM users u
  LEFT JOIN posts p ON p.user_id = u.id
  LEFT JOIN comments c ON c.user_id = u.id
  GROUP BY u.id, u.username
)
SELECT
  username,
  posts,
  comments,
  posts + comments AS total_activity
FROM user_stats
ORDER BY total_activity DESC;

Understanding joins and these advanced query techniques lets you extract exactly the data you need. Next, we'll explore indexes - how to make these queries fast even with millions of rows.

Found an issue?