SQL Basics — From Zero to Writing Real Database Queries

Q: What's the difference between WHERE and HAVING?

WHERE filters rows before any aggregation happens; HAVING filters after GROUP BY aggregation. You can't use aggregate functions like COUNT() or SUM() in WHERE because the aggregate doesn't exist yet at that point in query execution. Pattern: use WHERE for row-level filters (status = 'paid'), HAVING for aggregate-level filters (COUNT() > 10).

SQL is 50 years old and still the most important query language you'll ever learn. Whether you're building web apps, analyzing data, or just trying to understand what's in a database, being comfortable with SQL will save you hours. The good news is that most of what you need day-to-day is a fairly small set of concepts.

Let's go from zero to writing real queries.

What SQL Is (and Why It's Still Relevant)

SQL — Structured Query Language — is a declarative language for working with relational databases. You describe what you want, not how to get it. The database engine figures out the execution plan.

A relational database stores data in tables: rows and columns, like a spreadsheet but with enforced structure and relationships. SQL lets you query, insert, update, and delete data, and combine information from multiple tables in one operation.

Every major relational database — PostgreSQL, MySQL, SQLite, SQL Server, Oracle — speaks SQL with minor dialect differences. We'll flag the important ones as we go. (For canonical SELECT semantics, see the SQLite SELECT documentation.)

SELECT: Getting Data Out

The most fundamental query is SELECT ... FROM ...:

SELECT name, email FROM users;

This returns every row from the users table, but only the name and email columns. To get all columns:

SELECT * FROM users;

SELECT * is fine for exploration, but in production code always name your columns explicitly. It makes queries resilient to schema changes and clearer to anyone reading them later.

WHERE: Filtering Rows

Add a WHERE clause to filter which rows you get back:

SELECT name, email FROM users WHERE active = true;

SELECT * FROM orders WHERE total > 100 AND status = 'paid';

You can combine conditions with AND, OR, and NOT. String matching uses LIKE with % as a wildcard:

SELECT * FROM products WHERE name LIKE 'Widget%';

ORDER BY and LIMIT

Sort results with ORDER BY:

SELECT name, created_at FROM users ORDER BY created_at DESC;

DESC for descending (newest first), ASC (the default) for ascending. Use LIMIT to cap how many rows you get back:

SELECT * FROM orders ORDER BY total DESC LIMIT 10;

This gives you the 10 highest-value orders — a pattern you'll use constantly.

Aggregates: Counting and Summarizing

Aggregate functions collapse multiple rows into a single value:

SELECT COUNT(*) FROM users;            -- total user count
SELECT SUM(total) FROM orders;         -- total revenue
SELECT AVG(total) FROM orders;         -- average order value
SELECT MAX(total), MIN(total) FROM orders;

GROUP BY

GROUP BY lets you apply aggregates to groups of rows:

SELECT status, COUNT(*) as count
FROM orders
GROUP BY status;

This returns one row per distinct status value with the count of orders in each group.

HAVING

WHERE filters rows before aggregation. HAVING filters after:

SELECT customer_id, SUM(total) as revenue
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 500;

This returns only customers whose total order value exceeds 500. You can't do that with WHERE because the aggregate doesn't exist yet when WHERE runs.

JOINs: Combining Tables

JOINs are where SQL gets powerful. If your orders table stores a user_id but you also want the user's name, you join the two tables together. The full set of join types is documented in the MySQL JOIN reference.

INNER JOIN

Returns only rows that have a match in both tables:

SELECT users.name, orders.total, orders.created_at
FROM orders
INNER JOIN users ON orders.user_id = users.id;

If an order has no matching user, it won't appear. If a user has no orders, they won't appear either.

LEFT JOIN

Returns all rows from the left table, with matching rows from the right table — and NULL where there's no match:

SELECT users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON orders.user_id = users.id
GROUP BY users.id, users.name;

This lists every user and their order count, including users who've never placed an order (they get 0 after COUNT).

RIGHT JOIN

The mirror image of LEFT JOIN — all rows from the right table. In practice, most developers rewrite RIGHT JOIN as a LEFT JOIN with the tables swapped, because it's easier to read.

A simple way to remember it: the "outer" side of the join is the one that can contribute NULL rows. LEFT means the left table is outer.

Modifying Data

INSERT

INSERT INTO users (name, email, created_at)
VALUES ('Alice', 'alice@example.com', NOW());

You can insert multiple rows in one statement:

INSERT INTO tags (name) VALUES ('javascript'), ('typescript'), ('node');

UPDATE

UPDATE users SET active = false WHERE last_login < '2024-01-01';

Always use a WHERE clause with UPDATE. Without one, you update every row in the table. Most SQL clients will warn you, but not all do.

DELETE

DELETE FROM sessions WHERE expires_at < NOW();

Same rule: always scope your DELETE with WHERE. A bare DELETE FROM users is catastrophic in production.

Subqueries

A subquery is a query nested inside another query — useful for composing operations:

SELECT name, email FROM users
WHERE id IN (
  SELECT user_id FROM orders WHERE total > 1000
);

This returns users who have at least one order over 1000. You could also do this with a JOIN, and in most cases the JOIN version is faster — but subqueries are often easier to read when you're first composing a query.

Common Beginner Mistakes

Not using aliases. Once you're JOINing multiple tables, column names can collide. Use table.column or AS aliases.

Forgetting NULL handling. WHERE age = NULL doesn't work. NULL comparisons require IS NULL and IS NOT NULL. This trips up almost everyone at first.

Assuming ORDER BY without LIMIT is free. Sorting a million rows you don't need is expensive. If you only want the top 10, say so.

Using SELECT * in production joins. When you join two tables that both have an id column, SELECT * returns both. Explicit column lists avoid the ambiguity.

Dialect Differences

The SQL you write for PostgreSQL is close to what works in MySQL, but not identical. Key differences to watch for:

String concatenation: || in PostgreSQL/SQLite, CONCAT() function in MySQL
Date/time functions: NOW() works in both PostgreSQL and MySQL; SQLite uses datetime('now')
Boolean literals: true/false in PostgreSQL; 1/0 in MySQL and SQLite
LIMIT vs TOP: SQLite/PostgreSQL/MySQL use LIMIT; SQL Server uses SELECT TOP n
Full-text search: Completely different implementations between engines

For learning purposes, SQLite is the easiest to get started with — it's a single file, no server needed. PostgreSQL is the best choice when you want to graduate to production work — its official tutorial covers dialect specifics and advanced features.

You can format and pretty-print your SQL queries using our SQL Formatter tool, which makes long queries much easier to read and review. If you're working with data that came from a spreadsheet, CSV to JSON can help you reshape it before importing.

For a deeper dive into how databases actually execute your queries, check out Understanding Database Indexes — it's essential reading once your tables grow past a few thousand rows.

SQL is one of those skills that compounds. The more you write it, the more naturally you think in sets and relations. Start with simple SELECT queries, add JOINs when you're comfortable, and you'll be writing useful real-world queries within a day of practice.

FAQ

What's the right first SQL database to learn on?

SQLite. It's a single file, no server to install, no permissions to configure. Comes built-in with Python, Ruby, and macOS; trivially installable elsewhere. The SQL dialect is close enough to PostgreSQL and MySQL that you'll port skills cleanly. Once you're comfortable with SELECT, JOIN, and aggregates in SQLite, switching to PostgreSQL takes a few hours of dialect adjustment.

Should I learn ORM (Prisma, Drizzle, SQLAlchemy) or raw SQL first?

Raw SQL first. ORMs hide the query plan and make it easy to write code that issues 100 queries when 1 would do (the N+1 problem). Once you understand what SQL the database is running, ORMs become productivity tools instead of black boxes. The strongest backend developers know both — write the SQL by hand for complex analytics; use the ORM for boring CRUD.

What's the difference between WHERE and HAVING?

WHERE filters rows before any aggregation happens; HAVING filters after GROUP BY aggregation. You can't use aggregate functions like COUNT(*) or SUM() in WHERE because the aggregate doesn't exist yet at that point in query execution. Pattern: use WHERE for row-level filters (status = 'paid'), HAVING for aggregate-level filters (COUNT(*) > 10).

When does a JOIN beat a subquery and vice versa?

JOINs usually beat subqueries on performance because the planner can optimize them more aggressively (hash joins, merge joins, parallelism). Subqueries are often clearer for "find rows where X has any matching Y" patterns, and the optimizer can sometimes flatten them into joins automatically. For correlated subqueries (subquery references outer query columns), JOINs are almost always faster.

What's a window function and when do I need one?

Window functions let you compute a value across a "window" of rows related to the current row, without collapsing them via GROUP BY. Examples: running totals, ranking, moving averages, percentiles. Syntax: SUM(amount) OVER (PARTITION BY user_id ORDER BY date). They're indispensable for analytics queries — "rank users by total spend within their country" is trivial with RANK() OVER, ugly without.

Why doesn't `WHERE column = NULL` work?

Because NULL means "unknown," and "is unknown equal to unknown?" returns NULL (not true), so the row doesn't match. SQL has three-valued logic: TRUE, FALSE, and NULL. To check for null, use IS NULL or IS NOT NULL. Same applies to <> — WHERE column <> 'foo' excludes rows where column is NULL, which is often surprising. The fix is WHERE column <> 'foo' OR column IS NULL.

How do I avoid SQL injection?

Use parameterized queries (also called prepared statements) — never concatenate user input into SQL strings. In Node: pg.query('SELECT * FROM users WHERE id = $1', [userId]). In Python: cursor.execute('SELECT * FROM users WHERE id = %s', (user_id,)). The placeholder syntax differs by driver but the principle is the same: SQL and data travel separately, so no input can become executable SQL. ORMs do this for you automatically.

When should I add an index?

When you have a column you frequently filter or join on, and the table is large enough that scanning every row hurts. Indexes have a cost (slower writes, more disk space), so don't index every column. Common patterns: index foreign keys, index columns in WHERE and ORDER BY clauses, index columns used in JOIN conditions. Use EXPLAIN (or EXPLAIN ANALYZE in Postgres) to see what indexes the planner is using and where seq scans appear.