Indexing: How Databases Find Your Data Fast
There's a moment every developer hits: your query was fine with 1,000 rows, and then you hit 1,000,000 rows and suddenly it takes 10 seconds. Nine times out of ten, the fix is an index.
Let me break down what indexes are, how they actually work, and when to reach for them.
The Problem Without Indexes
Without an index, when you run:
SELECT * FROM users WHERE email = 'lokesh@example.com';
The database does a full table scan — it goes through every single row, one by one, and checks if the email matches. With a million users, that's a million comparisons. Slow.
An index solves this by creating a separate data structure that lets the database jump directly to the matching rows instead of scanning everything.
What Is an Index?
An index is a separate data structure (usually a B-tree) that stores a sorted copy of one or more columns, along with pointers to the actual rows.
Think of it like the index at the back of a book. Instead of reading every page to find "normalization", you look it up in the index, get the page number, and jump straight there. Same idea.
-- Creating an index on the email column
CREATE INDEX idx_users_email ON users(email);
After this, the same WHERE email = '...' query becomes drastically faster.
How Indexes Work Internally (B-Tree)
Most databases use a B-tree (Balanced Tree) structure for indexes. Here's the idea:
- The values are stored in a sorted tree structure.
- To find a value, the database traverses the tree from the root — comparing at each level and going left (smaller) or right (larger).
- Instead of scanning millions of rows, it takes just ~20 steps even for a billion rows (because
log₂(1,000,000,000) ≈ 30).
B-tree indexes are great for equality lookups (=), range queries (>, <, BETWEEN), and sorting (ORDER BY). They're the default index type in most databases.
Types of Indexes
1. Single Column Index
The most common. Index on one column.
CREATE INDEX idx_email ON users(email);
2. Composite Index (Multi-column)
Index on multiple columns together. Useful when you frequently filter by multiple columns.
CREATE INDEX idx_name_city ON users(last_name, city);
Important: A composite index on (last_name, city) helps queries that filter by last_name alone, or by last_name + city together. But it does not help a query filtering by city alone. The order matters — always put the most selective column first.
3. Unique Index
Ensures all values in the indexed column are unique. Primary keys automatically get this.
CREATE UNIQUE INDEX idx_unique_email ON users(email);
4. Full-Text Index
Used for searching inside text content (like searching blog posts or descriptions).
CREATE FULLTEXT INDEX idx_content ON articles(body);
5. Hash Index
Uses a hash function instead of a B-tree. Blazing fast for exact equality lookups, but useless for range queries or sorting.
The Cost of Indexes
Indexes aren't free. Here's what you're trading:
| Trade-off | What it means |
|---|---|
| Faster reads | Queries run significantly faster |
| Slower writes | Every INSERT/UPDATE/DELETE also updates the index |
| More storage | The index data structure takes up disk space |
| Index maintenance | Over time, indexes can get fragmented and need rebuilding |
Don't index every column "just in case". Over-indexing slows down writes and wastes storage. Index the columns you actually query frequently.
When to Use an Index
Good candidates for indexing:
- Columns used frequently in
WHEREclauses. - Columns used in
JOINconditions. - Columns used in
ORDER BYorGROUP BY. - Foreign key columns.
- Columns with high cardinality (many distinct values — like email or user ID).
Bad candidates for indexing:
- Columns with very low cardinality (like a
gendercolumn with only 2-3 values — the index barely helps). - Tables that are tiny (full scans are fine).
- Columns that are written to far more than read.
Checking if Your Query Uses an Index
Most databases have an EXPLAIN command to show the query execution plan:
EXPLAIN SELECT * FROM users WHERE email = 'lokesh@example.com';
Look for index scan or index seek — that means it's using the index. If you see seq scan or full table scan, there's no index being used for that query.
Quick Summary
| Without Index | With Index | |
|---|---|---|
| How it searches | Row by row (full scan) | Tree traversal (direct lookup) |
| Read speed | Slow on large tables | Fast |
| Write speed | Fast | Slightly slower |
| Storage | Minimal | Extra per index |
Indexing is one of those things where a 5-minute change can turn a 10-second query into a 10-millisecond one. Understanding when and how to use them is one of the most practical skills you can have as a developer working with databases.
Start by running EXPLAIN on your slowest queries. If you see full table scans on large tables with selective WHERE conditions — add an index there first. Measure before and after.