Last update: 2024-01-15

PostgreSQL Reference for Developers

Queries, joins, indexes, transactions, and the things that actually matter when building on Postgres

🔗PostgreSQL Reference for Developers

Queries, joins, indexes, transactions, and the things that actually matter when building on Postgres.

🔗Data Definition: Tables

 1    -- Create a table
 2    CREATE TABLE users (
 3        id          BIGSERIAL PRIMARY KEY,        -- auto-incrementing integer PK
 4        email       TEXT      NOT NULL UNIQUE,     -- enforced at DB level
 5        name        TEXT      NOT NULL,
 6        age         INTEGER   CHECK (age >= 0),
 7        role        TEXT      NOT NULL DEFAULT 'user',
 8        created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
 9        deleted_at  TIMESTAMPTZ                   -- NULL = not deleted (soft delete)
10    );
11
12    -- Prefer BIGSERIAL or BIGINT GENERATED ALWAYS AS IDENTITY over SERIAL
13    id  BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY
14
15    -- Or use UUID as primary key (requires pgcrypto or gen_random_uuid())
16    id  UUID PRIMARY KEY DEFAULT gen_random_uuid()
17
18    -- Alter an existing table
19    ALTER TABLE users ADD COLUMN bio TEXT;
20    ALTER TABLE users ALTER COLUMN name SET NOT NULL;
21    ALTER TABLE users DROP COLUMN bio;
22    ALTER TABLE users RENAME COLUMN name TO full_name;
23    ALTER TABLE users RENAME TO accounts;
24
25    -- Drop
26    DROP TABLE users;
27    DROP TABLE IF EXISTS users CASCADE; -- CASCADE drops dependent views/FKs

🔗Foreign Keys

 1    CREATE TABLE posts (
 2        id         BIGSERIAL PRIMARY KEY,
 3        user_id    BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
 4        title      TEXT NOT NULL,
 5        body       TEXT,
 6        published  BOOLEAN NOT NULL DEFAULT false,
 7        created_at TIMESTAMPTZ NOT NULL DEFAULT now()
 8    );
 9
10    -- ON DELETE options:
11    -- CASCADE     → delete child rows when parent is deleted
12    -- SET NULL    → set FK column to NULL when parent is deleted
13    -- SET DEFAULT → set FK column to its default value
14    -- RESTRICT    → prevent parent deletion if children exist (default)
15    -- NO ACTION   → like RESTRICT but deferred until end of transaction
16
17    -- Always index foreign key columns (Postgres does NOT do this automatically)
18    CREATE INDEX ON posts(user_id);

Postgres does not automatically create indexes on foreign key columns. Every FK column that you JOIN or filter on needs a manual CREATE INDEX. Missing FK indexes cause sequential scans and are one of the most common Postgres performance mistakes.

🔗INSERT, UPDATE, DELETE

 1    -- Insert a single row
 2    INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice');
 3
 4    -- Insert and return the generated id
 5    INSERT INTO users (email, name) VALUES ('bob@example.com', 'Bob') RETURNING id;
 6
 7    -- Insert multiple rows
 8    INSERT INTO users (email, name) VALUES
 9        ('carol@example.com', 'Carol'),
10        ('dave@example.com',  'Dave');
11
12    -- Upsert: insert or update on conflict
13    INSERT INTO users (email, name)
14    VALUES ('alice@example.com', 'Alice Updated')
15    ON CONFLICT (email)
16    DO UPDATE SET name = EXCLUDED.name, updated_at = now();
17
18    -- Ignore on conflict (do nothing)
19    INSERT INTO users (email, name) VALUES ('alice@example.com', 'Alice')
20    ON CONFLICT DO NOTHING;
21
22    -- Update
23    UPDATE users SET name = 'Alice Smith', role = 'admin' WHERE id = 1;
24
25    -- Update and return modified rows
26    UPDATE users SET role = 'admin' WHERE email = 'alice@example.com' RETURNING *;
27
28    -- Delete
29    DELETE FROM users WHERE id = 1;
30    DELETE FROM users WHERE deleted_at < now() - INTERVAL '30 days' RETURNING id;

🔗SELECT Fundamentals

 1    -- Basic select
 2    SELECT id, name, email FROM users;
 3    SELECT * FROM users;                     -- avoid * in production queries
 4
 5    -- Filter
 6    SELECT * FROM users WHERE role = 'admin' AND age > 18;
 7    SELECT * FROM users WHERE id IN (1, 2, 3);
 8    SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned);
 9    SELECT * FROM users WHERE deleted_at IS NULL;       -- NULL check
10    SELECT * FROM users WHERE deleted_at IS NOT NULL;
11
12    -- LIKE and ILIKE (case-insensitive)
13    SELECT * FROM users WHERE name LIKE 'A%';       -- starts with A
14    SELECT * FROM users WHERE email ILIKE '%@gmail.com';
15
16    -- Sort
17    SELECT * FROM users ORDER BY created_at DESC;
18    SELECT * FROM users ORDER BY role ASC, name ASC;
19    SELECT * FROM users ORDER BY name NULLS LAST;    -- push NULLs to the end
20
21    -- Limit and offset (pagination)
22    SELECT * FROM users ORDER BY id LIMIT 20 OFFSET 40; -- page 3 of 20

OFFSET pagination is slow on large tables. OFFSET 10000 LIMIT 20 requires scanning 10020 rows to return 20. Use keyset (cursor) pagination instead: WHERE id > :last_id ORDER BY id LIMIT 20. Much faster and consistent under inserts.

🔗Aggregates and GROUP BY

 1    -- Aggregate functions
 2    SELECT COUNT(*) FROM users;
 3    SELECT COUNT(*) FROM users WHERE role = 'admin';
 4    SELECT AVG(age), MIN(age), MAX(age), SUM(age) FROM users;
 5
 6    -- GROUP BY
 7    SELECT role, COUNT(*) AS total
 8    FROM users
 9    GROUP BY role
10    ORDER BY total DESC;
11
12    -- HAVING: filter on aggregated values (WHERE runs before aggregation, HAVING after)
13    SELECT user_id, COUNT(*) AS post_count
14    FROM posts
15    GROUP BY user_id
16    HAVING COUNT(*) > 5
17    ORDER BY post_count DESC;
18
19    -- Count distinct values
20    SELECT COUNT(DISTINCT user_id) FROM posts;
21
22    -- Conditional aggregation
23    SELECT
24        COUNT(*) AS total,
25        COUNT(*) FILTER (WHERE role = 'admin') AS admins,
26        COUNT(*) FILTER (WHERE deleted_at IS NOT NULL) AS deleted
27    FROM users;

🔗Subqueries and CTEs

 1    -- Subquery in WHERE
 2    SELECT * FROM posts
 3    WHERE user_id IN (
 4        SELECT id FROM users WHERE role = 'admin'
 5    );
 6
 7    -- EXISTS: often faster than IN for large subqueries
 8    SELECT * FROM users u
 9    WHERE EXISTS (
10        SELECT 1 FROM posts p WHERE p.user_id = u.id
11    );
12
13    -- CTE (Common Table Expression): named subquery, run once, referenced multiple times
14    WITH active_users AS (
15        SELECT id, name FROM users WHERE deleted_at IS NULL
16    ),
17    prolific AS (
18        SELECT user_id, COUNT(*) AS post_count
19        FROM posts
20        GROUP BY user_id
21        HAVING COUNT(*) > 10
22    )
23    SELECT u.name, p.post_count
24    FROM active_users u
25    JOIN prolific p ON p.user_id = u.id
26    ORDER BY p.post_count DESC;
27
28    -- Recursive CTE: for trees and hierarchies
29    WITH RECURSIVE org_tree AS (
30        SELECT id, name, manager_id, 0 AS depth
31        FROM employees WHERE manager_id IS NULL     -- base case: root
32        UNION ALL
33        SELECT e.id, e.name, e.manager_id, t.depth + 1
34        FROM employees e
35        JOIN org_tree t ON e.manager_id = t.id          -- recursive case
36    )
37    SELECT * FROM org_tree ORDER BY depth, name;

🔗JOIN Types

Join Type	Returns	When to Use
`INNER JOIN`	Rows where the condition matches in both tables	You only want rows that have a matching partner
`LEFT JOIN`	All rows from left table; NULLs where no match in right	Optional relationship — keep left rows even if no match
`RIGHT JOIN`	All rows from right table; NULLs where no match in left	Rarely used; just flip the tables and use LEFT JOIN
`FULL OUTER JOIN`	All rows from both tables; NULLs where no match	Find rows that exist in one table but not the other
`CROSS JOIN`	Cartesian product — every combination	Generating test data; matrix comparisons

 1    -- INNER JOIN: only posts that have a matching user
 2    SELECT p.title, u.name
 3    FROM posts p
 4    JOIN users u ON u.id = p.user_id;  -- JOIN = INNER JOIN
 5
 6    -- LEFT JOIN: all users, even those with no posts
 7    SELECT u.name, COUNT(p.id) AS post_count
 8    FROM users u
 9    LEFT JOIN posts p ON p.user_id = u.id
10    GROUP BY u.id, u.name
11    ORDER BY post_count DESC;
12
13    -- LEFT JOIN to find rows with NO match (anti-join pattern)
14    SELECT u.* FROM users u
15    LEFT JOIN posts p ON p.user_id = u.id
16    WHERE p.id IS NULL;  -- users who have never posted
17
18    -- Multiple joins
19    SELECT p.title, u.name AS author, c.body AS comment
20    FROM posts p
21    JOIN users u ON u.id = p.user_id
22    LEFT JOIN comments c ON c.post_id = p.id
23    WHERE p.published = true
24    ORDER BY p.created_at DESC;
25
26    -- Self-join: join a table to itself (e.g., employee → manager)
27    SELECT e.name AS employee, m.name AS manager
28    FROM employees e
29    LEFT JOIN employees m ON m.id = e.manager_id;

JOIN order doesn't change results, but it hints to the planner. Put the smaller/more-filtered table first for readability. Postgres's query planner will reorder joins for efficiency regardless — but explicit JOIN instead of comma-separated FROM makes intent clear and avoids accidental cartesian products.

🔗UNION, INTERSECT, EXCEPT

 1    -- UNION: combine results, remove duplicates
 2    SELECT email FROM users
 3    UNION
 4    SELECT email FROM pending_users;
 5
 6    -- UNION ALL: combine results, keep duplicates (faster — no dedup step)
 7    SELECT id, name, 'user' AS source FROM users
 8    UNION ALL
 9    SELECT id, name, 'admin' AS source FROM admins;
10
11    -- INTERSECT: rows in both result sets
12    SELECT email FROM newsletter_subscribers
13    INTERSECT
14    SELECT email FROM paying_customers;
15
16    -- EXCEPT: rows in first set but not second
17    SELECT email FROM users
18    EXCEPT
19    SELECT email FROM unsubscribed;

🔗Index Basics

Indexes speed up reads by creating a separate data structure the planner can use instead of scanning the whole table. They cost write overhead and storage. Add them where you filter, sort, or join — not everywhere.

 1    -- Basic B-tree index (default, best for equality and range queries)
 2    CREATE INDEX idx_users_email ON users(email);
 3    CREATE INDEX ON posts(user_id);    -- auto-named
 4    CREATE INDEX ON posts(created_at DESC);
 5
 6    -- Unique index (enforces uniqueness, equivalent to UNIQUE constraint)
 7    CREATE UNIQUE INDEX ON users(email);
 8
 9    -- Composite index: column order matters
10    -- Useful when you filter on (status, created_at) together
11    -- Also usable for queries on just `status` (leftmost prefix rule)
12    CREATE INDEX ON posts(status, created_at DESC);
13
14    -- Partial index: only indexes rows matching a condition
15    -- Smaller, faster for queries that always include that condition
16    CREATE INDEX ON users(email) WHERE deleted_at IS NULL;
17    CREATE INDEX ON orders(created_at) WHERE status = 'pending';
18
19    -- Concurrent index creation: doesn't lock the table (use in production)
20    CREATE INDEX CONCURRENTLY idx_posts_title ON posts(title);
21
22    -- Drop
23    DROP INDEX idx_users_email;
24    DROP INDEX CONCURRENTLY idx_posts_title; -- non-blocking drop
25
26    -- List indexes on a table
27    \d users  -- psql: shows table structure and indexes

🔗Index Types

Type	Best For	Notes
`B-tree`	Equality, range, sorting, LIKE 'prefix%'	Default. Use for almost everything.
`Hash`	Equality only (`=`)	Slightly faster than B-tree for pure equality, but B-tree is usually fine.
`GIN`	Arrays, JSONB, full-text search, `@>`, `?`	Required for indexing JSONB keys/values and array containment.
`GiST`	Geometric types, full-text, ranges	Used by PostGIS. Also for `tsvector` full-text search.
`BRIN`	Very large tables with naturally ordered data (time-series, logs)	Tiny. Fast to build. Poor selectivity. Good for append-only partitioned tables.

 1    -- GIN index for JSONB (required for @>, ?, ?|, ?& operators)
 2    CREATE INDEX ON events USING gin(data);
 3
 4    -- GIN index for full-text search
 5    CREATE INDEX ON articles USING gin(to_tsvector('english', title || ' ' || body));
 6
 7    -- BRIN index for a large log table (created_at is always increasing)
 8    CREATE INDEX ON logs USING brin(created_at);

🔗EXPLAIN: Reading the Query Plan

 1    -- Show the query plan (no execution)
 2    EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
 3
 4    -- Show plan WITH actual execution stats (run the query)
 5    EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'alice@example.com';
 6
 7    -- Key things to look for in EXPLAIN output:
 8    -- Seq Scan     → full table scan. BAD on large tables. Missing index?
 9    -- Index Scan   → uses index to find rows, fetches from heap. GOOD.
10    -- Index Only Scan → all data in index, no heap access. BEST.
11    -- Bitmap Scan  → index scan for many rows, batch heap access. OK.
12    -- Nested Loop  → for each row in outer, scan inner. Fast if inner is small.
13    -- Hash Join    → build hash table from smaller side. Fast for large joins.
14    -- Merge Join   → merge two sorted inputs. Fast when both sides are sorted.
15    -- cost=X..Y    → estimated startup cost .. total cost (arbitrary planner units)
16    -- rows=N       → estimated row count (inaccurate = stale statistics → ANALYZE)
17    -- actual time=X..Y rows=N loops=N → real execution data (with ANALYZE)
18
19    -- Update statistics if estimates are way off
20    ANALYZE users;
21    ANALYZE;  -- analyze all tables

Use explain.dalibo.com. Paste your EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) output into explain.dalibo.com for a visual tree with highlighted bottlenecks. Far easier to read than raw text output for complex plans.

🔗Transactions

A transaction groups statements into an all-or-nothing unit. Either all statements succeed and commit, or any failure rolls everything back. Postgres transactions are fully ACID compliant.

 1    -- Basic transaction
 2    BEGIN;
 3        UPDATE accounts SET balance = balance - 100 WHERE id = 1;
 4        UPDATE accounts SET balance = balance + 100 WHERE id = 2;
 5    COMMIT;  -- both updates committed atomically
 6
 7    -- Rollback on error
 8    BEGIN;
 9        UPDATE accounts SET balance = balance - 1000 WHERE id = 1;
10        -- something goes wrong in application code...
11    ROLLBACK;  -- undoes the UPDATE, balance unchanged
12
13    -- Savepoints: partial rollback within a transaction
14    BEGIN;
15        INSERT INTO orders (...) VALUES (...);
16        SAVEPOINT after_order;
17        INSERT INTO order_items (...) VALUES (...);
18        -- if items insert fails:
19        ROLLBACK TO SAVEPOINT after_order;
20        -- order still exists, items rolled back
21    COMMIT;

🔗Isolation Levels

Isolation levels control what concurrent transactions can see of each other's changes. Higher isolation = fewer anomalies but more contention.

Level	Dirty Read	Non-repeatable Read	Phantom Read	Use Case
`READ COMMITTED`	No	Yes	Yes	Default. Fine for most apps.
`REPEATABLE READ`	No	No	No*	Reports, consistent snapshots.
`SERIALIZABLE`	No	No	No	Financial ops, strict correctness.

1    -- Set isolation level for a transaction
2    BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
3        SELECT SUM(balance) FROM accounts;  -- consistent snapshot for this tx
4        -- other transactions can commit changes, but we don't see them
5    COMMIT;

🔗Locking

 1    -- SELECT FOR UPDATE: lock rows for update, block other writers
 2    -- Use when you read a row and intend to update it (prevents lost updates)
 3    BEGIN;
 4    SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
 5    -- now we own a lock: other transactions block on this row
 6    UPDATE accounts SET balance = balance - 50 WHERE id = 1;
 7    COMMIT;
 8
 9    -- SELECT FOR UPDATE SKIP LOCKED: skip rows locked by others
10    -- Pattern: job queue — workers grab available jobs without blocking each other
11    BEGIN;
12    SELECT * FROM jobs
13    WHERE status = 'pending'
14    ORDER BY created_at
15    LIMIT 1
16    FOR UPDATE SKIP LOCKED;
17    -- process the job...
18    UPDATE jobs SET status = 'done' WHERE id = :job_id;
19    COMMIT;
20
21    -- Advisory locks: application-level locks (no table/row needed)
22    SELECT pg_try_advisory_lock(12345);  -- returns true if acquired, false if not
23    SELECT pg_advisory_unlock(12345);

Deadlocks happen when two transactions each hold a lock the other needs. Postgres detects them and kills one transaction. Prevent them by always acquiring locks in the same order across transactions, and keeping transactions short.

🔗VACUUM and Autovacuum

 1    -- Why VACUUM exists:
 2    -- Postgres uses MVCC: UPDATE and DELETE don't remove old row versions.
 3    -- They mark them dead. Dead rows pile up ("table bloat").
 4    -- VACUUM reclaims dead row space and updates the visibility map.
 5
 6    -- Manual vacuum (autovacuum usually handles this)
 7    VACUUM users;
 8    VACUUM ANALYZE users;   -- vacuum + update statistics
 9    VACUUM FULL users;       -- rewrite table, reclaim OS space — locks table, use sparingly
10
11    -- Check autovacuum health
12    SELECT relname, n_dead_tup, n_live_tup, last_autovacuum, last_autoanalyze
13    FROM pg_stat_user_tables
14    ORDER BY n_dead_tup DESC;

🔗Data Types

Type	Use For	Notes
`TEXT`	Variable-length strings	Preferred over VARCHAR(n) — no performance difference, less hassle.
`VARCHAR(n)`	Strings with length limit	Only use if you need DB-level length enforcement.
`INTEGER / INT`	32-bit integer	Max ~2.1 billion. Use BIGINT if IDs may exceed that.
`BIGINT`	64-bit integer	Preferred for IDs on tables that will grow.
`NUMERIC(p,s)`	Exact decimals (money)	No floating-point errors. Use for currency. Slower than float.
`FLOAT8 / DOUBLE PRECISION`	Approximate decimals	Fast. Imprecise. Not for money.
`BOOLEAN`	true/false	Accepts true/false, 't'/'f', 'yes'/'no', 1/0.
`TIMESTAMPTZ`	Timestamps	Always use TIMESTAMPTZ (with time zone) — stores UTC internally.
`DATE`	Calendar date only	No time component.
`INTERVAL`	Duration	`INTERVAL '3 days'`, `INTERVAL '1 hour 30 minutes'`.
`UUID`	Universally unique IDs	Use `gen_random_uuid()` (built-in since Pg 13).
`JSONB`	JSON data	Binary JSON — indexable, queryable. Prefer over `JSON`.
`ARRAY`	Arrays of any type	`TEXT[]`, `INTEGER[]`. Indexable with GIN.
`ENUM`	Fixed set of string values	Enforced at DB level. Adding values requires ALTER TYPE.

Always use TIMESTAMPTZ, never TIMESTAMP. TIMESTAMP (without time zone) stores no timezone info. When you insert a value, Postgres strips the offset. You lose the ability to reason about time correctly across timezones. TIMESTAMPTZ always stores UTC and converts on display.

🔗JSONB

 1    -- Store and query semi-structured data
 2    CREATE TABLE events (
 3        id    BIGSERIAL PRIMARY KEY,
 4        data  JSONB NOT NULL
 5    );
 6
 7    -- Insert JSON
 8    INSERT INTO events (data) VALUES
 9        ('{"type": "click", "user_id": 42, "tags": ["mobile", "nav"]}');
10
11    -- Access operators
12    data->'user_id'           -- returns JSON value: 42
13    data->>'user_id'          -- returns TEXT value: '42'  (use for WHERE comparisons)
14    data->'address'->'city'  -- nested access
15    data#>'{address,city}'     -- path access (array of keys)
16    data#>>'{address,city}'  -- path access returning TEXT
17
18    -- Query
19    SELECT * FROM events WHERE data->>'type' = 'click';
20    SELECT * FROM events WHERE (data->>'user_id')::INT = 42;
21    SELECT * FROM events WHERE data @> '{"type": "click"}';  -- containment (use GIN index)
22    SELECT * FROM events WHERE data ? 'user_id';             -- key exists
23    SELECT * FROM events WHERE data->'tags' @> '["mobile"]'; -- array contains value
24
25    -- Update a key in JSONB
26    UPDATE events SET data = data || '{"processed": true}' WHERE id = 1;
27    UPDATE events SET data = jsonb_set(data, '{user_id}', '99') WHERE id = 1;
28
29    -- Index for fast JSONB queries
30    CREATE INDEX ON events USING gin(data);

🔗ENUMs and Custom Types

 1    -- Create an ENUM type
 2    CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
 3
 4    CREATE TABLE orders (
 5        id     BIGSERIAL PRIMARY KEY,
 6        status order_status NOT NULL DEFAULT 'pending'
 7    );
 8
 9    -- Add a value to an existing ENUM (can only add, not remove)
10    ALTER TYPE order_status ADD VALUE 'refunded' AFTER 'delivered';

🔗Window Functions

Window functions perform a calculation across a set of rows related to the current row — without collapsing them into a single group like GROUP BY would.

 1    -- ROW_NUMBER: rank each row within a partition
 2    SELECT
 3        name,
 4        department,
 5        salary,
 6        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
 7    FROM employees;
 8
 9    -- RANK and DENSE_RANK: same as ROW_NUMBER but ties get same rank
10    -- RANK skips numbers after a tie (1,1,3), DENSE_RANK doesn't (1,1,2)
11    RANK() OVER (ORDER BY score DESC)
12    DENSE_RANK() OVER (ORDER BY score DESC)
13
14    -- LAG / LEAD: access previous / next row's value
15    SELECT
16        date,
17        revenue,
18        LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
19        revenue - LAG(revenue) OVER (ORDER BY date) AS change
20    FROM daily_revenue;
21
22    -- Running total with SUM OVER
23    SELECT
24        date,
25        amount,
26        SUM(amount) OVER (ORDER BY date) AS running_total
27    FROM transactions;
28
29    -- Top N per group (e.g., top 3 posts per user)
30    SELECT * FROM (
31        SELECT *,
32            ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
33        FROM posts
34    ) sub
35    WHERE rn <= 3;

🔗Full-Text Search

 1    -- tsvector: processed text for searching
 2    -- tsquery: a search query
 3
 4    -- Basic full-text search
 5    SELECT title FROM articles
 6    WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'postgres & index');
 7
 8    -- Ranking results by relevance
 9    SELECT title,
10        ts_rank(to_tsvector('english', body), to_tsquery('postgres')) AS rank
11    FROM articles
12    WHERE to_tsvector('english', body) @@ to_tsquery('postgres')
13    ORDER BY rank DESC;
14
15    -- Store tsvector in a column for performance (update with trigger or generated column)
16    ALTER TABLE articles ADD COLUMN search_vector TSVECTOR
17        GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
18
19    CREATE INDEX ON articles USING gin(search_vector);
20
21    -- plainto_tsquery: user input (no operators, just words)
22    WHERE search_vector @@ plainto_tsquery('english', user_input)
23
24    -- websearch_to_tsquery: supports "phrases", -exclusions (Pg 11+)
25    WHERE search_vector @@ websearch_to_tsquery('english', '"exact phrase" -exclude')

🔗Generated Columns and Constraints

 1    -- Generated (computed) column: value is always derived from other columns
 2    ALTER TABLE products
 3        ADD COLUMN price_with_tax NUMERIC
 4        GENERATED ALWAYS AS (price * 1.20) STORED;
 5
 6    -- Table-level constraints
 7    CREATE TABLE bookings (
 8        id         BIGSERIAL PRIMARY KEY,
 9        start_date DATE NOT NULL,
10        end_date   DATE NOT NULL,
11        CONSTRAINT valid_dates CHECK (end_date > start_date),
12        CONSTRAINT unique_booking UNIQUE (user_id, start_date)
13    );
14
15    -- Exclusion constraint: no overlapping date ranges (requires btree_gist extension)
16    CREATE EXTENSION IF NOT EXISTS btree_gist;
17    ALTER TABLE bookings
18        ADD CONSTRAINT no_overlap
19        EXCLUDE USING gist (room_id WITH =, daterange(start_date, end_date) WITH &&);

🔗psql CLI

 1    # Connect
 2    psql -U username -d dbname
 3    psql -U username -h hostname -p 5432 -d dbname
 4    psql "postgresql://user:password@host:5432/dbname"
 5
 6    # Meta-commands (no semicolon needed)
 7    \l          -- list databases
 8    \c dbname   -- connect to database
 9    \dt         -- list tables in current schema
10    \dt *.*     -- list tables in all schemas
11    \d tablename -- describe table (columns, indexes, constraints)
12    \di         -- list indexes
13    \df         -- list functions
14    \dv         -- list views
15    \dn         -- list schemas
16    \du         -- list users/roles
17    \timing     -- toggle query timing
18    \x          -- toggle expanded output (useful for wide tables)
19    \e          -- open query in $EDITOR
20    \i file.sql -- run SQL file
21    \copy       -- client-side COPY (works over remote connections)
22    \q          -- quit
23
24    # Run a query from the shell
25    psql -U postgres -d mydb -c "SELECT COUNT(*) FROM users;"
26
27    # Run a SQL file
28    psql -U postgres -d mydb -f schema.sql

🔗Users, Roles, and Permissions

 1    -- Create a role (roles can login, groups cannot — but the distinction is just flags)
 2    CREATE ROLE app_user LOGIN PASSWORD 'securepassword';
 3    CREATE ROLE readonly NOLOGIN;  -- group role
 4
 5    -- Grant connect and usage
 6    GRANT CONNECT ON DATABASE mydb TO app_user;
 7    GRANT USAGE ON SCHEMA public TO app_user;
 8
 9    -- Grant table permissions
10    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
11    GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_user;
12
13    -- Grant on future tables automatically
14    ALTER DEFAULT PRIVILEGES IN SCHEMA public
15        GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_user;
16
17    -- Read-only role pattern
18    GRANT USAGE ON SCHEMA public TO readonly;
19    GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
20    GRANT readonly TO reporting_user;  -- assign group role to a user
21
22    -- Revoke
23    REVOKE INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public FROM readonly;

🔗Backup and Restore

 1    # pg_dump: logical backup (SQL or custom format)
 2    pg_dump -U postgres mydb > mydb.sql                         # plain SQL
 3    pg_dump -U postgres -Fc mydb > mydb.dump                   # custom format (preferred)
 4    pg_dump -U postgres -Fc -t users mydb > users.dump         # single table
 5    pg_dump -U postgres -Fc --schema-only mydb > schema.dump   # schema only
 6    pg_dump -U postgres -Fc --data-only mydb > data.dump       # data only
 7
 8    # Restore custom format
 9    pg_restore -U postgres -d mydb mydb.dump
10    pg_restore -U postgres -d mydb --clean mydb.dump           # drop objects first
11    pg_restore -U postgres -d mydb -j 4 mydb.dump              # parallel restore (4 workers)
12
13    # Restore plain SQL
14    psql -U postgres -d mydb < mydb.sql
15
16    # pg_dumpall: dump all databases + roles + tablespaces
17    pg_dumpall -U postgres > all.sql
18
19    # COPY: fast bulk import/export
20    # Server-side (must be superuser, reads/writes server filesystem)
21    COPY users TO '/tmp/users.csv' CSV HEADER;
22    COPY users FROM '/tmp/users.csv' CSV HEADER;
23
24    # Client-side (works over remote connections)
25    \copy users TO 'users.csv' CSV HEADER
26    \copy users FROM 'users.csv' CSV HEADER

🔗Useful Diagnostic Queries

 1    -- Show running queries
 2    SELECT pid, now() - pg_stat_activity.query_start AS duration,
 3           query, state
 4    FROM pg_stat_activity
 5    WHERE state != 'idle'
 6    ORDER BY duration DESC;
 7
 8    -- Kill a query
 9    SELECT pg_cancel_backend(pid);   -- graceful cancel
10    SELECT pg_terminate_backend(pid); -- force kill
11
12    -- Show locks and what's blocking what
13    SELECT blocked.pid, blocked.query,
14           blocking.pid AS blocking_pid, blocking.query AS blocking_query
15    FROM pg_stat_activity blocked
16    JOIN pg_stat_activity blocking
17        ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
18
19    -- Slowest queries (requires pg_stat_statements extension)
20    CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
21    SELECT query, round(mean_exec_time::numeric, 2) AS avg_ms,
22           calls, round(total_exec_time::numeric, 2) AS total_ms
23    FROM pg_stat_statements
24    ORDER BY mean_exec_time DESC
25    LIMIT 20;
26
27
28    -- Table sizes
29    SELECT relname AS table,
30        pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
31        pg_size_pretty(pg_relation_size(relid)) AS table_size,
32        pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
33    FROM pg_stat_user_tables
34    ORDER BY pg_total_relation_size(relid) DESC;
35
36    -- Unused indexes (wasting write overhead)
37    SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
38    FROM pg_stat_user_indexes
39    WHERE idx_scan = 0
40    ORDER BY pg_relation_size(indexrelid) DESC;

pg_stat_statements is essential. Enable it in postgresql.conf with shared_preload_libraries = 'pg_stat_statements' and restart. It tracks query statistics across all executions — the single most useful tool for finding slow queries in production.

The Dev Reference Library

PostgreSQL Reference for Developers

🔗PostgreSQL Reference for Developers

🔗Data Definition: Tables

🔗Foreign Keys

🔗INSERT, UPDATE, DELETE

🔗SELECT Fundamentals

🔗Aggregates and GROUP BY

🔗Subqueries and CTEs

🔗JOIN Types

🔗UNION, INTERSECT, EXCEPT

🔗Index Basics

🔗Index Types

🔗EXPLAIN: Reading the Query Plan

🔗Transactions

🔗Isolation Levels

🔗Locking

🔗VACUUM and Autovacuum

🔗Data Types

🔗JSONB

🔗ENUMs and Custom Types

🔗Window Functions

🔗Full-Text Search

🔗Generated Columns and Constraints

🔗psql CLI

🔗Users, Roles, and Permissions

🔗Backup and Restore

🔗Useful Diagnostic Queries