Understanding Database Internals with PostgreSQL: From Seq Scan to Index Scan

March 28, 2026

When working with databases in real-world systems, one of the most common performance issues is slow queries. Many developers rely on trial-and-error fixes, but understanding how the database actually executes queries is what truly differentiates an average engineer from a strong one.

In this article, we’ll walk through key database internals using PostgreSQL — covering execution plans, index usage, and core architectural concepts.

Getting Started: What is an Execution Plan?

When you run a query, the database doesn’t execute it directly. Instead, it goes through:

SQL → Parser → Optimizer → Execution Engine

The optimizer decides how to execute your query efficiently.

To see this decision, we use:

EXPLAIN SELECT * FROM test."Employee";

Sequential Scan (Seq Scan)

Example output:

Seq Scan on "Employee"  (cost=0.00..1736.00 rows=100000 width=27)

What does this mean?

Seq Scan → Database scans entire table row by row
cost → Estimated effort (not actual time)
rows → Estimated number of rows returned
width → Average row size in bytes

When does this happen?

When querying the entire table
When no useful index exists
When the optimizer thinks scanning everything is cheaper

👉 Key Insight:

Sequential scan is NOT bad — it’s optimal for large data retrieval.

Index Scan (Faster Lookup)

Example:

EXPLAIN SELECT * FROM test."Employee" WHERE id = 10;

Output:

Index Scan using "Employee_pkey" on "Employee"
(cost=0.29..8.31 rows=1 width=27)

What’s happening?

Database uses a B-Tree index
It directly jumps to the matching row
Avoids scanning entire table

👉 Key Insight:

Index scans are useful when fetching a small subset of data

Why Index Is Not Always Used

Even if an index exists:

SELECT * FROM test."Employee";

👉 Still results in Seq Scan

Why?

Because:

Query needs 100% of rows
Sequential read is faster than repeated index lookups

👉 Golden Rule:

Seq Scan → large data access
Index Scan → selective queries

Importance of ANALYZE

Initially, PostgreSQL may show wrong estimates:

rows=850   (incorrect)

After running:

ANALYZE test."Employee";

It becomes:

rows=1   (accurate)

Why?

Because ANALYZE:

Collects table statistics
Helps optimizer make better decisions

👉 Without it:

You may get inefficient query plans

VACUUM and MVCC

PostgreSQL uses MVCC (Multi-Version Concurrency Control).

What happens during UPDATE/DELETE?

Old rows are not removed immediately
New versions are created
Old versions become “dead tuples”

Why VACUUM is needed?

VACUUM test."Employee";

Cleans dead rows
Prevents table bloat
Updates visibility map

👉 Important for:

Performance
Index-only scans

Autovacuum: Do You Need to Run It Manually?

PostgreSQL has autovacuum, which automatically runs:

VACUUM
ANALYZE

But manual execution is needed when:

Bulk data insert/update
Query performance issues
Debugging execution plans

Index Only Scan (Advanced)

Sometimes, PostgreSQL can avoid reading the table entirely:

EXPLAIN SELECT id FROM test."Employee";

👉 Expected:

Index Only Scan

But you may still see Seq Scan

Because:

Visibility map is not updated
VACUUM hasn’t run

👉 Fix:

VACUUM test."Employee";

Primary Key and Index

When you define:

id PRIMARY KEY

PostgreSQL automatically creates:

CREATE UNIQUE INDEX "Employee_pkey" ON "Employee"(id);

👉 Meaning:

Primary key = constraint + index
Used for fast lookups

Core Database Architecture Concepts

To truly understand databases, you should know:

1. Storage (Heap & Pages)

Data stored in pages (8KB blocks)
Tables = collection of pages

2. MVCC

Enables high concurrency
Avoids read locks
Requires VACUUM

3. Query Optimizer

Cost-based decision making
Chooses best execution plan

4. Indexing

B-Tree (default)
Improves lookup performance
Comes with write overhead

5. Joins

Nested Loop
Hash Join
Merge Join

6. WAL (Write-Ahead Logging)

Ensures durability
Used for crash recovery

Key Takeaways

EXPLAIN is your best debugging tool
Indexes are not always faster
Statistics drive optimizer decisions
VACUUM and ANALYZE are critical for performance
Understanding internals helps in real production scenarios

Final Thoughts

Learning database internals is not about memorizing commands — it’s about building intuition:

“Why is the database choosing this plan?”

Once you start thinking like the optimizer, debugging performance issues becomes much easier.

If you’re exploring database internals, start experimenting:

Insert large datasets
Compare execution plans
Add/remove indexes
Observe behavior

That’s where real learning happens.

Search This Blog

Java Technical Blog

Understanding Database Internals with PostgreSQL: From Seq Scan to Index Scan

Getting Started: What is an Execution Plan?

Sequential Scan (Seq Scan)

What does this mean?

When does this happen?

Index Scan (Faster Lookup)

What’s happening?

Why Index Is Not Always Used

Why?

Importance of ANALYZE

Why?

VACUUM and MVCC

What happens during UPDATE/DELETE?

Why VACUUM is needed?

Autovacuum: Do You Need to Run It Manually?

But manual execution is needed when:

Index Only Scan (Advanced)

But you may still see Seq Scan

Primary Key and Index

Core Database Architecture Concepts

1. Storage (Heap & Pages)

2. MVCC

3. Query Optimizer

4. Indexing

5. Joins

6. WAL (Write-Ahead Logging)

Key Takeaways

Final Thoughts

Comments

Post a Comment

Popular posts from this blog

RDS vs Aurora DB

Java 8 : Find the number starts with 1 from a list of integers

Optional Vs Null Check