Transactions: Grouping Operations That Must Succeed Together
I briefly touched on transactions in my ACID Properties post. Now let's go deeper — because transactions are the core mechanism that makes databases reliable for real-world operations.
What Is a Transaction?
A transaction is a sequence of one or more database operations (reads/writes) that are treated as a single unit of work. Either all operations in the transaction complete successfully, or none of them do.
The classic example — a bank transfer:
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1; -- debit
UPDATE accounts SET balance = balance + 500 WHERE account_id = 2; -- credit
COMMIT;
Both updates are wrapped in one transaction. If the second update fails (maybe account 2 doesn't exist), the first one gets rolled back too. The money doesn't disappear into a void.
The Three Commands: BEGIN, COMMIT, ROLLBACK
BEGIN; -- Start the transaction
-- ... your SQL operations ...
COMMIT; -- Save all changes permanently
-- OR
ROLLBACK; -- Undo everything since BEGIN
COMMIT— you're happy with the result, make it permanent.ROLLBACK— something went wrong, undo everything.
Most databases also support SAVEPOINT — partial checkpoints within a transaction you can roll back to without undoing everything:
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE account_id = 1;
SAVEPOINT after_debit;
UPDATE accounts SET balance = balance + 500 WHERE account_id = 2;
-- something went wrong here...
ROLLBACK TO after_debit; -- only undo the second update
COMMIT;
Transaction States
A transaction moves through these states:
- Active — operations are being executed.
- Partially committed — all operations done, waiting to be written to disk.
- Committed — successfully saved. Permanent.
- Failed — an error occurred.
- Aborted — rolled back. Database restored to pre-transaction state.
Concurrency Problems Transactions Solve
When multiple transactions run at the same time (which they always do in production), things can go wrong without proper controls. Here are the classic problems:
Dirty Read
Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A was reading garbage data.
Non-Repeatable Read
Transaction A reads a row. Transaction B updates that row and commits. Transaction A reads the same row again and gets a different value. Same query, different result in the same transaction.
Phantom Read
Transaction A queries rows matching a condition (e.g. age > 25). Transaction B inserts a new row that matches. Transaction A re-runs the same query and now gets an extra row that "appeared" — a phantom.
Isolation Levels
To handle these problems, databases offer different isolation levels — basically, how strictly transactions are isolated from each other. Higher isolation = fewer problems but more performance overhead.
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| READ UNCOMMITTED | ✅ Possible | ✅ Possible | ✅ Possible |
| READ COMMITTED | ❌ Prevented | ✅ Possible | ✅ Possible |
| REPEATABLE READ | ❌ Prevented | ❌ Prevented | ✅ Possible |
| SERIALIZABLE | ❌ Prevented | ❌ Prevented | ❌ Prevented |
-- Set isolation level before starting a transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
-- ...
COMMIT;
READ UNCOMMITTED
The lowest level. You can read uncommitted changes from other transactions. Fast but risky — almost never used in practice.
READ COMMITTED
The default in PostgreSQL and SQL Server. You only see data that has been committed. Prevents dirty reads. Most apps are fine with this.
REPEATABLE READ
The default in MySQL InnoDB. Once you read a row in a transaction, it won't change if another transaction updates it — you'll always get the same value for that row within your transaction.
SERIALIZABLE
The strictest. Transactions are executed as if they ran one after another (serial). Zero concurrency anomalies, but can be slow due to locking.
For most web applications, READ COMMITTED is the sweet spot. It prevents the most dangerous issues (dirty reads) while keeping things performant. Step up to REPEATABLE READ or SERIALIZABLE only when your business logic demands it.
How Databases Implement Transactions Internally
Two main techniques:
Write-Ahead Logging (WAL)
Before any change is written to the actual data file, it's first written to a log. If the system crashes mid-transaction, the database can replay or undo operations from the log on recovery. This is how durability (the D in ACID) is guaranteed.
MVCC (Multi-Version Concurrency Control)
Instead of locking rows for reads, the database keeps multiple versions of a row. Each transaction sees a consistent snapshot of the data as it was when the transaction started — even if other transactions are modifying data in parallel. This is how PostgreSQL and MySQL handle concurrency without blocking reads.
MVCC is why you can read from a PostgreSQL database under heavy write load without getting blocked. Reads don't wait for writes to finish — they just read an older snapshot.
Quick Summary
| Concept | What it means |
|---|---|
| Transaction | A group of operations that succeed or fail together |
| COMMIT | Persist all changes permanently |
| ROLLBACK | Undo all changes since BEGIN |
| Isolation Level | How much a transaction is shielded from others running in parallel |
| WAL | Log-first approach for crash recovery |
| MVCC | Multiple data versions so reads don't block writes |
Transactions are what separate a reliable production database from a fragile one. Once you understand them — and the isolation levels especially — you can make much more informed decisions about how your application handles data under concurrent load.